A pinboard by
Jian Zhao

Ph.D. student, National University of Singapore


Face Recognition and Human Parsing

I am a full-time Ph.D. student at Learning and Vision Group, Department of Electrical and Computer Engineering (ECE), Faculty of Engineering, National University of Singapore (NUS). My main supervisor is Dr. FENG Jiashi and my co-supervisor is Dr. YAN Shuicheng. I am generously supported by China Scholarship Council (CSC) and School of Computer, National University of Defense Technology (NUDT), China. My domestic supervisor of NUDT is Dr. LIU Hengzhu. Currently, I am working on developing Deep Neural Network models for fine-grained image understanding, applied to Face Recognition, Image Generation and Human Parsing. I Have an M.Eng. degree in Computer Science for Signal Processing with a thesis titled "Research on the Equalization Technologies for the Wireless Image Transmission Data Link System Based on the UAV Platform". Research interests: Artificial Intelligence, Deep Learning and Computer Vision, Unconstrained Face Recognition, Image Generation with Adversarial Learning, and Human Parsing.


Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

Abstract: Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods.

Pub.: 08 Aug '17, Pinned: 28 Aug '17

Generative Adversarial Network-based Synthesis of Visible Faces from Polarimetric Thermal Faces

Abstract: The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face recognition quite a challenging problem for both human-examiners and computer vision algorithms. Previous approaches utilize a two-step procedure (visible feature estimation and visible image reconstruction) to synthesize the visible image given the corresponding polarimetric thermal image. However, these are regarded as two disjoint steps and hence may hinder the performance of visible face reconstruction. We argue that joint optimization would be a better way to reconstruct more photo-realistic images for both computer vision algorithms and human-examiners to examine. To this end, this paper proposes a Generative Adversarial Network-based Visible Face Synthesis (GAN-VFS) method to synthesize more photo-realistic visible face images from their corresponding polarimetric images. To ensure that the encoded visible-features contain more semantically meaningful information in reconstructing the visible face image, a guidance sub-network is involved into the training procedure. To achieve photo realistic property while preserving discriminative characteristics for the reconstructed outputs, an identity loss combined with the perceptual loss are optimized in the framework. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance.

Pub.: 08 Aug '17, Pinned: 28 Aug '17