A pinboard by
Uzair Nadeem

Ph.D. Student, The University of Western Australia


To develop robust, efficient and practical algorithms for scene understanding.

The main aim of my research is to develop robust, efficient and practical algorithms for scene understanding. Scene understanding seeks to equip computers with human like vision capabilities. The performance of machine learning and computer vision techniques has increased considerably for various components of scene understanding e.g. for scene classification, segmentation, object recognition, text detection and depth map generation. However, it is now required to combine these individual tasks so that they can support one another. This is an essential step towards human like perception of scenes. In order to achieve this goal, we need generalizable algorithms which can produce state-of-the-art results for the different components of scene understanding. Deep learning has evolved as the main machine learning technique for feature extraction and classification in scene understanding tasks. My research goal is the application of deep learning and machine learning techniques for the development of robust algorithms which will especially be applicable for the tasks of object recognition, face recognition and surveillance. Another important, yet relatively unexplored, feature in scene understanding is text which is the most important form of human communication. Text occurring in natural scenes can provide information about the context of the scene, the category of the scene, the types of potential objects in the scene as well as possible interactions between them. My second goal is the development of text detection and recognition methods with the goal to be able to use them in real world scenarios. I am investigating novel methods and architectures of deep neural networks to improve the efficiency and robustness of text localization techniques. My final goal will be to integrate these recognition capabilities and incorporate them in a robot owned by our research group. My research will be useful in the fields of surveillance, robotic scene understanding, image and video retrieval from a large database (or internet) and autonomous driving vehicles. It will also be helpful in the development of personal assistance devices for visually impaired, blind and elderly people.


Dynamic ensembles of exemplar-SVMs for still-to-video face recognition

Abstract: Face recognition (FR) plays an important role in video surveillance by allowing to accurately recognize individuals of interest over a distributed network of cameras. Systems for still-to-video FR are exposed to challenging operational environments. The appearance of faces changes when captured under unconstrained conditions due to variations in pose, scale, illumination, occlusion, blur, etc. Moreover, the facial models used for matching may not be robust to intra-class variations because they are typically designed a priori with one reference facial still per person. Indeed, faces captured during enrollment (using still cameras) may differ considerably from those captured during operations (using surveillance cameras). In this paper, an efficient multi-classifier system (MCS) is proposed for accurate still-to-video FR based on multiple face representations and domain adaptation (DA). An individual-specific ensemble of exemplar-SVM (e-SVM) classifiers is thereby designed to improve robustness to intra-class variations. During enrollment of a target individual, an ensemble is used to model the single reference still, where multiple face descriptors and random feature subspaces allow to generate a diverse pool of patch-wise classifiers. To adapt these ensembles to the operational domains, e-SVMs are trained using labeled face patches extracted from the reference still versus patches extracted from cohort and other non-target stills mixed with unlabeled patches extracted from the corresponding face trajectories captured with surveillance cameras. During operations, the most competent classifiers per given probe face are dynamically selected and weighted based on the internal criteria determined in the feature space of e-SVMs. This paper also investigates the impact of using different training schemes for DA, as well as, the validation set of non-target faces extracted from stills and video trajectories of unknown individuals in the operational domain. The performance of the proposed system was validated using videos from the COX-S2V and Chokepoint datasets. Results indicate that the proposed system can surpass state-of-the-art accuracy, yet with a significantly lower computational complexity. Indeed, dynamic selection and weighting allow to combine only the most relevant classifiers for each input probe.

Pub.: 13 Apr '17, Pinned: 30 Jul '17