3 research outputs found

    Overcomplete Dictionary and Deep Learning Approaches to Image and Video Analysis

    Get PDF
    Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essential and challenging data analyzing step for many computer vision tasks such as facial recognition, scene reconstruction, event detection, image restoration, etc. Data analyzing of those tasks can be formulated as a form of matrix decomposition or factorization to separate useful and/or fill in missing information based on sparsity and/or low-rankness of the data. There has been an increasing number of non-convex approaches including conventional matrix norm optimizing and emerging deep learning models. However, it is hard to optimize the ideal l0-norm or learn the deep models directly and efficiently. Motivated from this challenging process, this thesis proposes two sets of approaches: conventional and deep learning based. For conventional approaches, this thesis proposes a novel online non-convex lp-norm based Robust PCA (OLP-RPCA) approach for matrix decomposition, where 0 < p < 1. OLP-RPCA is developed from the offline version LP-RPCA. A robust face recognition framework is also developed from Robust PCA and sparse coding approaches. More importantly, OLP-RPCA method can achieve real-time performance on large-scale data without parallelizing or implementing on a graphics processing unit. We mathematically and empirically show that our OLP-RPCA algorithm is linear in both the sample dimension and the number of samples. The proposed OLP-RPCA and LP-RPCA approaches are evaluated in various applications including Gaussian/non-Gaussian image denoising, face modeling, real-time background subtraction and video inpainting and compared against numerous state-of-the-art methods to demonstrate the robustness of the algorithms. In addition, this thesis proposes a novel Robust lp-norm Singular Value Decomposition (RP-SVD) method for analyzing two-way functional data. The proposed RP-SVD is formulated as an lp-norm based penalized loss minimization problem. The proposed RP-SVD method is evaluated in four applications, i.e. noise and outlier removal, estimation of missing values, structure from motion reconstruction and facial image reconstruction. For deep learning based approaches, this thesis explores the idea of matrix decomposition via Robust Deep Boltzmann Machines (RDBM), an alternative form of Robust Boltzmann Machines, which aiming at dealing with noise and occlusion for face-related applications, particularly. This thesis proposes an extension to texture modeling in the Deep Appearance Models (DAMs) by using RDBM to enhance its robustness against noise and occlusion. The extended model can cope with occlusion and extreme poses when modeling human faces in 2D image reconstruction. This thesis also introduces new fitting algorithms with occlusion awareness through the mask obtained from the RDBM reconstruction. The proposed approach is evaluated in various applications by using challenging face datasets, i.e. Labeled Face Parts in the Wild (LFPW), Helen, EURECOM and AR databases, to demonstrate its robustness and capabilities

    Beyond PCA: Deep Learning Approaches for Face Modeling and Aging

    Get PDF
    Modeling faces with large variations has been a challenging task in computer vision. These variations such as expressions, poses and occlusions are usually complex and non-linear. Moreover, new facial images also come with their own characteristic artifacts greatly diverse. Therefore, a good face modeling approach needs to be carefully designed for flexibly adapting to these challenging issues. Recently, Deep Learning approach has gained significant attention as one of the emerging research topics in both higher-level representation of data and the distribution of observations. Thanks to the nonlinear structure of deep learning models and the strength of latent variables organized in hidden layers, it can efficiently capture variations and structures in complex data. Inspired by this motivation, we present two novel approaches, i.e. Deep Appearance Models (DAM) and Robust Deep Appearance Models (RDAM), to accurately capture both shape and texture of face images under large variations. In DAM, three crucial components represented in hierarchical layers are modeled using Deep Boltzmann Machines (DBM) to robustly capture the variations of facial shapes and appearances. DAM has shown its potential in inferencing a representation for new face images under various challenging conditions. An improved version of DAM, named Robust DAM (RDAM), is also introduced to better handle the occluded face areas and, therefore, produces more plausible reconstruction results. These proposed approaches are evaluated in various applications to demonstrate their robustness and capabilities, e.g. facial super-resolution reconstruction, facial off-angle reconstruction, facial occlusion removal and age estimation using challenging face databases: Labeled Face Parts in the Wild (LFPW), Helen and FG-NET. Comparing to classical and other deep learning based approaches, the proposed DAM and RDAM achieve competitive results in those applications, thus this showed their advantages in handling occlusions, facial representation, and reconstruction. In addition to DAM and RDAM that are mainly used for modeling single facial image, the second part of the thesis focuses on novel deep models, i.e. Temporal Restricted Boltzmann Machines (TRBM) and tractable Temporal Non-volume Preserving (TNVP) approaches, to further model face sequences. By exploiting the additional temporal relationships presented in sequence data, the proposed models have their advantages in predicting the future of a sequence from its past. In the application of face age progression, age regression, and age-invariant face recognition, these models have shown their potential not only in efficiently capturing the non-linear age related variance but also producing a smooth synthesis in age progression across faces. Moreover, the structure of TNVP can be transformed into a deep convolutional network while keeping the advantages of probabilistic models with tractable log-likelihood density estimation. The proposed approach is evaluated in terms of synthesizing age-progressed faces and cross-age face verification. It consistently shows the state-of-the-art results in various face aging databases, i.e. FG-NET, MORPH, our collected large-scale aging database named AginG Faces in the Wild (AGFW), and Cross-Age Celebrity Dataset (CACD). A large-scale face verification on Megaface challenge 1 is also performed to further show the advantages of our proposed approach
    corecore