816 research outputs found

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    Statistical/Geometric Techniques for Object Representation and Recognition

    Get PDF
    Object modeling and recognition are key areas of research in computer vision and graphics with wide range of applications. Though research in these areas is not new, traditionally most of it has focused on analyzing problems under controlled environments. The challenges posed by real life applications demand for more general and robust solutions. The wide variety of objects with large intra-class variability makes the task very challenging. The difficulty in modeling and matching objects also vary depending on the input modality. In addition, the easy availability of sensors and storage have resulted in tremendous increase in the amount of data that needs to be processed which requires efficient algorithms suitable for large-size databases. In this dissertation, we address some of the challenges involved in modeling and matching of objects in realistic scenarios. Object matching in images require accounting for large variability in the appearance due to changes in illumination and view point. Any real world object is characterized by its underlying shape and albedo, which unlike the image intensity are insensitive to changes in illumination conditions. We propose a stochastic filtering framework for estimating object albedo from a single intensity image by formulating the albedo estimation as an image estimation problem. We also show how this albedo estimate can be used for illumination insensitive object matching and for more accurate shape recovery from a single image using standard shape from shading formulation. We start with the simpler problem where the pose of the object is known and only the illumination varies. We then extend the proposed approach to handle unknown pose in addition to illumination variations. We also use the estimated albedo maps for another important application, which is recognizing faces across age progression. Many approaches which address the problem of modeling and recognizing objects from images assume that the underlying objects are of diffused texture. But most real world objects exhibit a combination of diffused and specular properties. We propose an approach for separating the diffused and specular reflectance from a given color image so that the algorithms proposed for objects of diffused texture become applicable to a much wider range of real world objects. Representing and matching the 2D and 3D geometry of objects is also an integral part of object matching with applications in gesture recognition, activity classification, trademark and logo recognition, etc. The challenge in matching 2D/3D shapes lies in accounting for the different rigid and non-rigid deformations, large intra-class variability, noise and outliers. In addition, since shapes are usually represented as a collection of landmark points, the shape matching algorithm also has to deal with the challenges of missing or unknown correspondence across these data points. We propose an efficient shape indexing approach where the different feature vectors representing the shape are mapped to a hash table. For a query shape, we show how the similar shapes in the database can be efficiently retrieved without the need for establishing correspondence making the algorithm extremely fast and scalable. We also propose an approach for matching and registration of 3D point cloud data across unknown or missing correspondence using an implicit surface representation. Finally, we discuss possible future directions of this research

    Evaluating the Performance of a Large-Scale Facial Image Dataset Using Agglomerated Match Score Statistics

    Get PDF
    Biometrics systems are experiencing wide-spread usage in identification and access control applications. To estimate the performance of any biometric systems, their characteristics need to be analyzed to make concrete conclusions for real time usage. Performance testing of hardware or software components of either custom or state-of-the-art commercial biometric systems is typically carried out on large datasets. Several public and private datasets are used in current biometric research. West Virginia University has completed several large scale multimodal biometric data collection with an aim to create research datasets that can be used by disciplines concerning secured biometric applications. However, the demographic and image quality properties of these datasets can potentially lead to bias when they are used in performance testing of new systems. To overcome this, the characteristics of datasets used for performance testing must be well understood prior to usage.;This thesis will answer three main questions associated with this issue:;• For a single matcher, do the genuine and impostor match score distributions within specific demographics groups vary from those of the entire dataset? • What are the possible ways to compare the subset of demographic match score distributions against those of the entire dataset? • Based on these comparisons, what conclusions can be made about the characteristics of dataset?;In this work, 13,976 frontal face images from WVU\u27s 2012 Biometric collection project funded by the FBI involving 1200 individuals were used as a \u27test\u27 dataset. The goal was to evaluate performance of this dataset by generating genuine and impostor match scores distributions using a commercial matching software Further, the dataset was categorized demographically, and match score distributions were generated for these subsets in order to explore whether or not this breakdown impacted match score distributions. The match score distributions of the overall dataset were compared against each demographic cohorts.;Using statistical measures, Area under Curve (AUC) and Equal Error Rate (EER) were observed by plotting Receiver Operating Characteristics (ROC) curves to measure the performance of each demographic group with respect to overall data and also within the cohorts of demographic group. Also, Kull-back Leibler Divergence and Jensen Shannon Divergence values were calculated for each demographic cohort (age, gender and ethnicity) within the overall data. These statistical approaches provide a numerical value representing the amount of variation between two match score distributions In addition, FAR and FRR was observed to estimate the error rates. These statistical measures effectively enabled the determination of the impact of different demographic breakdown on match score distributions, and thus, helped in understanding the characteristics of dataset and how they may impact its usage in performance testing biometrics

    Robust Image Recognition Based on a New Supervised Kernel Subspace Learning Method

    Get PDF
    Fecha de lectura de Tesis Doctoral: 13 de septiembre 2019Image recognition is a term for computer technologies that can recognize certain people, objects or other targeted subjects through the use of algorithms and machine learning concepts. Face recognition is one of the most popular techniques to achieve the goal of figuring out the identity of a person. This study has been conducted to develop a new non-linear subspace learning method named “supervised kernel locality-based discriminant neighborhood embedding,” which performs data classification by learning an optimum embedded subspace from a principal high dimensional space. In this approach, not only is a nonlinear and complex variation of face images effectively represented using nonlinear kernel mapping, but local structure information of data from the same class and discriminant information from distinct classes are also simultaneously preserved to further improve final classification performance. Moreover, to evaluate the robustness of the proposed method, it was compared with several well-known pattern recognition methods through comprehensive experiments with six publicly accessible datasets. In this research, we particularly focus on face recognition however, two other types of databases rather than face databases are also applied to well investigate the implementation of our algorithm. Experimental results reveal that our method consistently outperforms its competitors across a wide range of dimensionality on all the datasets. SKLDNE method has reached 100 percent of recognition rate for Tn=17 on the Sheffield, 9 on the Yale, 8 on the ORL, 7 on the Finger vein and 11on the Finger Knuckle respectively, while the results are much lower for other methods. This demonstrates the robustness and effectiveness of the proposed method

    Facial Image Analysis for Body Mass Index, Makeup and Identity

    Get PDF
    The principal aim of facial image analysis in computer vision is to extract valuable information(e.g., age, gender, ethnicity, and identity) by interpreting perceived electronic signals from face images. In this dissertation, we develop facial image analysis systems for body mass index (BMI) prediction, makeup detection, as well as facial identity with makeup changes and BMI variations.;BMI is a commonly used measure of body fatness. In the first part of this thesis, we study BMI related topics. At first, we develop a computational method to predict BMI information from face images automatically. We formulate the BMI prediction from facial features as a machine vision problem. Three regression methods, including least square estimation, Gaussian processes for regression, and support vector regression are employed to predict the BMI value. Our preliminary results show that it is feasible to develop a computational system for BMI prediction from face images. Secondly, we address the influence of BMI changes on face identity. Both synthesized and real face images are assembled as the databases to facilitate our study. Empirically, we found that large BMI alterations can significantly reduce the matching accuracy of the face recognition system. Then we study if the influence of BMI changes can be reduced to improve the face recognition performance. The partial least squares (PLS) method is applied for this purpose. Experimental results show the feasibility to develop algorithms to address the influence of facial adiposity variations on face recognition, caused by BMI changes.;Makeup can affect facial appearance obviously. In the second part of this thesis, we deal with makeup influence on face identity. It is principal to perform makeup detection at first to address makeup influence. Four categories of features are proposed to characterize facial makeup cues in our study, including skin color tone, skin smoothness, texture, and highlight. A patch selection scheme and discriminative mapping are presented to enhance the performance of makeup detection. Secondly, we study dual attributes from makeup and non-makeup faces separately to reflect facial appearance changes caused by makeup in a semantic level. Cross-makeup attribute classification and accuracy change analysis is operated to divide dual attributes into four categories according to different makeup effects. To develop a face recognition system that is robust to facial makeup, PLS method is proposed on features extracted from local patches. We also propose a dual-attributes based method for face verification. Shared dual attributes can be used to measure facial similarity, rather than a direct matching with low-level features. Experimental results demonstrate the feasibility to eliminate the influence of makeup on face recognition.;In summary, contributions of this dissertation center in developing facial image analysis systems to deal with newly emerged topics effectively, i.e., BMI prediction, makeup detection, and the rcognition of face identity with makeup and BMI changes. In particular,to the best of our knowledge, BMI related topics, i.e., BMI prediction; the influence of BMI changes on face recognition; and face recognition robust to BMI changes are first explorations to the biometrics society

    RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS

    Get PDF
    Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings. We address the video-based face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases. We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize the face fiducial points and estimate the face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model other face variations, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces has been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database. Finally, we present an approach for video-based face recognition using camera networks. The focus is on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the needs for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network

    Gaussian processes for modeling of facial expressions

    Get PDF
    Automated analysis of facial expressions has been gaining significant attention over the past years. This stems from the fact that it constitutes the primal step toward developing some of the next-generation computer technologies that can make an impact in many domains, ranging from medical imaging and health assessment to marketing and education. No matter the target application, the need to deploy systems under demanding, real-world conditions that can generalize well across the population is urgent. Hence, careful consideration of numerous factors has to be taken prior to designing such a system. The work presented in this thesis focuses on tackling two important problems in automated analysis of facial expressions: (i) view-invariant facial expression analysis; (ii) modeling of the structural patterns in the face, in terms of well coordinated facial muscle movements. Driven by the necessity for efficient and accurate inference mechanisms we explore machine learning techniques based on the probabilistic framework of Gaussian processes (GPs). Our ultimate goal is to design powerful models that can efficiently handle imagery with spontaneously displayed facial expressions, and explain in detail the complex configurations behind the human face in real-world situations. To effectively decouple the head pose and expression in the presence of large out-of-plane head rotations we introduce a manifold learning approach based on multi-view learning strategies. Contrary to the majority of existing methods that typically treat the numerous poses as individual problems, in this model we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Hence, the pose normalization problem is solved by aligning the facial expressions from different poses in a common latent space. We demonstrate that the recovered manifold can efficiently generalize to various poses and expressions even from a small amount of training data, while also being largely robust to corrupted image features due to illumination variations. State-of-the-art performance is achieved in the task of facial expression classification of basic emotions. The methods that we propose for learning the structure in the configuration of the muscle movements represent some of the first attempts in the field of analysis and intensity estimation of facial expressions. In these models, we extend our multi-view approach to exploit relationships not only in the input features but also in the multi-output labels. The structure of the outputs is imposed into the recovered manifold either from heuristically defined hard constraints, or in an auto-encoded manner, where the structure is learned automatically from the input data. The resulting models are proven to be robust to data with imbalanced expression categories, due to our proposed Bayesian learning of the target manifold. We also propose a novel regression approach based on product of GP experts where we take into account people's individual expressiveness in order to adapt the learned models on each subject. We demonstrate the superior performance of our proposed models on the task of facial expression recognition and intensity estimation.Open Acces
    • …
    corecore