124 research outputs found

    3D Face Tracking and Texture Fusion in the Wild

    Full text link
    We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. With the use of a cascaded-regressor based face tracking and a 3D Morphable Face Model shape fitting, we obtain a semi-dense 3D face shape. We further use the texture information from multiple frames to build a holistic 3D face representation from the video frames. Our system is able to capture facial expressions and does not require any person-specific training. We demonstrate the robustness of our approach on the challenging 300 Videos in the Wild (300-VW) dataset. Our real-time fitting framework is available as an open source library at http://4dface.org

    A Subspace Projection Methodology for Nonlinear Manifold Based Face Recognition

    Get PDF
    A novel feature extraction method that utilizes nonlinear mapping from the original data space to the feature space is presented in this dissertation. Feature extraction methods aim to find compact representations of data that are easy to classify. Measurements with similar values are grouped to same category, while those with differing values are deemed to be of separate categories. For most practical systems, the meaningful features of a pattern class lie in a low dimensional nonlinear constraint region (manifold) within the high dimensional data space. A learning algorithm to model this nonlinear region and to project patterns to this feature space is developed. Least squares estimation approach that utilizes interdependency between points in training patterns is used to form the nonlinear region. The proposed feature extraction strategy is employed to improve face recognition accuracy under varying illumination conditions and facial expressions. Though the face features show variations under these conditions, the features of one individual tend to cluster together and can be considered as a neighborhood. Low dimensional representations of face patterns in the feature space may lie in a nonlinear constraint region, which when modeled leads to efficient pattern classification. A feature space encompassing multiple pattern classes can be trained by modeling a separate constraint region for each pattern class and obtaining a mean constraint region by averaging all the individual regions. Unlike most other nonlinear techniques, the proposed method provides an easy intuitive way to place new points onto a nonlinear region in the feature space. The proposed feature extraction and classification method results in improved accuracy when compared to the classical linear representations. Face recognition accuracy is further improved by introducing the concepts of modularity, discriminant analysis and phase congruency into the proposed method. In the modular approach, feature components are extracted from different sub-modules of the images and concatenated to make a single vector to represent a face region. By doing this we are able to extract features that are more representative of the local features of the face. When projected onto an arbitrary line, samples from well formed clusters could produce a confused mixture of samples from all the classes leading to poor recognition. Discriminant analysis aims to find an optimal line orientation for which the data classes are well separated. Experiments performed on various databases to evaluate the performance of the proposed face recognition technique have shown improvement in recognition accuracy, especially under varying illumination conditions and facial expressions. This shows that the integration of multiple subspaces, each representing a part of a higher order nonlinear function, could represent a pattern with variability. Research work is progressing to investigate the effectiveness of subspace projection methodology for building manifolds with other nonlinear functions and to identify the optimum nonlinear function from an object classification perspective

    High Dimensional Data Set Analysis Using a Large-Scale Manifold Learning Approach

    Get PDF
    Because of technological advances, a trend occurs for data sets increasing in size and dimensionality. Processing these large scale data sets is challenging for conventional computers due to computational limitations. A framework for nonlinear dimensionality reduction on large databases is presented that alleviates the issue of large data sets through sampling, graph construction, manifold learning, and embedding. Neighborhood selection is a key step in this framework and a potential area of improvement. The standard approach to neighborhood selection is setting a fixed neighborhood. This could be a fixed number of neighbors or a fixed neighborhood size. Each of these has its limitations due to variations in data density. A novel adaptive neighbor-selection algorithm is presented to enhance performance by incorporating sparse â„“ 1-norm based optimization. These enhancements are applied to the graph construction and embedding modules of the original framework. As validation of the proposed â„“1-based enhancement, experiments are conducted on these modules using publicly available benchmark data sets. The two approaches are then applied to a large scale magnetic resonance imaging (MRI) data set for brain tumor progression prediction. Results showed that the proposed approach outperformed linear methods and other traditional manifold learning algorithms

    Single View Reconstruction for Human Face and Motion with Priors

    Get PDF
    Single view reconstruction is fundamentally an under-constrained problem. We aim to develop new approaches to model human face and motion with model priors that restrict the space of possible solutions. First, we develop a novel approach to recover the 3D shape from a single view image under challenging conditions, such as large variations in illumination and pose. The problem is addressed by employing the techniques of non-linear manifold embedding and alignment. Specifically, the local image models for each patch of facial images and the local surface models for each patch of 3D shape are learned using a non-linear dimensionality reduction technique, and the correspondences between these local models are then learned by a manifold alignment method. Local models successfully remove the dependency of large training databases for human face modeling. By combining the local shapes, the global shape of a face can be reconstructed directly from a single linear system of equations via least square. Unfortunately, this learning-based approach cannot be successfully applied to the problem of human motion modeling due to the internal and external variations in single view video-based marker-less motion capture. Therefore, we introduce a new model-based approach for capturing human motion using a stream of depth images from a single depth sensor. While a depth sensor provides metric 3D information, using a single sensor, instead of a camera array, results in a view-dependent and incomplete measurement of object motion. We develop a novel two-stage template fitting algorithm that is invariant to subject size and view-point variations, and robust to occlusions. Starting from a known pose, our algorithm first estimates a body configuration through temporal registration, which is used to search the template motion database for a best match. The best match body configuration as well as its corresponding surface mesh model are deformed to fit the input depth map, filling in the part that is occluded from the input and compensating for differences in pose and body-size between the input image and the template. Our approach does not require any makers, user-interaction, or appearance-based tracking. Experiments show that our approaches can achieve good modeling results for human face and motion, and are capable of dealing with variety of challenges in single view reconstruction, e.g., occlusion

    Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

    Full text link
    It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.Comment: 33 pages, 12 figure

    Automatic Recognition and Generation of Affective Movements

    Get PDF
    Body movements are an important non-verbal communication medium through which affective states of the demonstrator can be discerned. For machines, the capability to recognize affective expressions of their users and generate appropriate actuated responses with recognizable affective content has the potential to improve their life-like attributes and to create an engaging, entertaining, and empathic human-machine interaction. This thesis develops approaches to systematically identify movement features most salient to affective expressions and to exploit these features to design computational models for automatic recognition and generation of affective movements. The proposed approaches enable 1) identifying which features of movement convey affective expressions, 2) the automatic recognition of affective expressions from movements, 3) understanding the impact of kinematic embodiment on the perception of affective movements, and 4) adapting pre-defined motion paths in order to "overlay" specific affective content. Statistical learning and stochastic modeling approaches are leveraged, extended, and adapted to derive a concise representation of the movements that isolates movement features salient to affective expressions and enables efficient and accurate affective movement recognition and generation. In particular, the thesis presents two new approaches to fixed-length affective movement representation based on 1) functional feature transformation, and 2) stochastic feature transformation (Fisher scores). The resulting representations are then exploited for recognition of affective expressions in movements and for salient movement feature identification. For functional representation, the thesis adapts dimensionality reduction techniques (namely, principal component analysis (PCA), Fisher discriminant analysis, Isomap) for functional datasets and applies the resulting reduction techniques to extract a minimal set of features along which affect-specific movements are best separable. Furthermore, the centroids of affect-specific clusters of movements in the resulting functional PCA subspace along with the inverse mapping of functional PCA are used to generate prototypical movements for each affective expression. The functional discriminative modeling is however limited to cases where affect-specific movements also have similar kinematic trajectories and does not address the interpersonal and stochastic variations inherent to bodily expression of affect. To account for these variations, the thesis presents a novel affective movement representation in terms of stochastically-transformed features referred to as Fisher scores. The Fisher scores are derived from affect-specific hidden Markov model encoding of the movements and exploited to discriminate between different affective expressions using a support vector machine (SVM) classification. Furthermore, the thesis presents a new approach for systematic identification of a minimal set of movement features most salient to discriminating between different affective expressions. The salient features are identified by mapping Fisher scores to a low-dimensional subspace where dependencies between the movements and their affective labels are maximized. This is done by maximizing Hilbert Schmidt independence criterion between the Fisher score representation of movements and their affective labels. The resulting subspace forms a suitable basis for affective movement recognition using nearest neighbour classification and retains the high recognition rates achieved by SVM classification in the Fisher score space. The dimensions of the subspace form a minimal set of salient features and are used to explore the movement kinematic and dynamic cues that connote affective expressions. Furthermore, the thesis proposes the use of movement notation systems from the dance community (specifically, the Laban system) for abstract coding and computational analysis of movement. A quantification approach for Laban Effort and Shape is proposed and used to develop a new computational model for affective movement generation. Using the Laban Effort and Shape components, the proposed generation approach searches a labeled dataset for movements that are kinematically similar to a desired motion path and convey a target emotion. A hidden Markov model of the identified movements is obtained and used with the desired motion path in the Viterbi state estimation. The estimated state sequence is then used to generate a novel movement that is a version of the desired motion path, modulated to convey the target emotion. Various affective human movement corpora are used to evaluate and demonstrate the efficacy of the developed approaches for the automatic recognition and generation of affective expressions in movements. Finally, the thesis assesses the human perception of affective movements and the impact of display embodiment and the observer's gender on the affective movement perception via user studies in which participants rate the expressivity of synthetically-generated and human-generated affective movements animated on anthropomorphic and non-anthropomorphic embodiments. The user studies show that the human perception of affective movements is mainly shaped by intended emotions, and that the display embodiment and the observer's gender can significantly impact the perception of affective movements
    • …
    corecore