132 research outputs found

    Face recognition from face motion manifolds using robust kernel resistor-average distance

    Full text link
    In this work we consider face recognition from face motion manifolds. An information-theoretic approach with Resistor-Average Distance (RAD) as a dissimilarity measure between distributions of face images is proposed. We introduce a kernel-based algorithm that retains the simplicity of the closed-form expression for the RAD between two normal distributions, while allowing for modelling of complex, nonlinear manifolds. Additionally, it is shown how errors in the face registration process can be modelled to significantly improve recognition. Recognition performance of our method is experimentally demonstrated and shown to outperform state-of-the-art algorithms. Recognition rates of 97–100% are consistently achieved on databases of 35– 90 people

    Learning over sets using boosted manifold principal angles (BoMPA)

    Full text link
    In this paper we address the problem of classifying vector sets. We motivate and introduce a novel method based on comparisons between corresponding vector subspaces. In particular, there are two main areas of novelty: (i) we extend the concept of principal angles between linear subspaces to manifolds with arbitrary nonlinearities; (ii) it is demonstrated how boosting can be used for application-optimal principal angle fusion. The strengths of the proposed method are empirically demonstrated on the task of automatic face recognition (AFR), in which it is shown to outperform state-of-the-art methods in the literature

    An illumination invariant face recognition system for access control using video

    Full text link
    Illumination and pose invariance are the most challenging aspects of face recognition. In this paper we describe a fully automatic face recognition system that uses video information to achieve illumination and pose robustness. In the proposed method, highly nonlinear manifolds of face motion are approximated using three Gaussian pose clusters. Pose robustness is achieved by comparing the corresponding pose clusters and probabilistically combining the results to derive a measure of similarity between two manifolds. Illumination is normalized on a per-pose basis. Region-based gamma intensity correction is used to correct for coarse illumination changes, while further refinement is achieved by combining a learnt linear manifold of illumination variation with constraints on face pattern distribution, derived from video. Comparative experimental evaluation is presented and the proposed method is shown to greatly outperform state-of-the-art algorithms. Consistent recognition rates of 94-100% are achieved across dramatic changes in illumination

    A framework of face recognition with set of testing images

    Get PDF
    We propose a novel framework to solve the face recognition problem base on set of testing images. Our framework can handle the case that no pose overlap between training set and query set. The main techniques used in this framework are manifold alignment, face normalization and discriminant learning. Experiments on different databases show our system outperforms some state of the art methods

    RECOGNITION OF FACES FROM SINGLE AND MULTI-VIEW VIDEOS

    Get PDF
    Face recognition has been an active research field for decades. In recent years, with videos playing an increasingly important role in our everyday life, video-based face recognition has begun to attract considerable research interest. This leads to a wide range of potential application areas, including TV/movies search and parsing, video surveillance, access control etc. Preliminary research results in this field have suggested that by exploiting the abundant spatial-temporal information contained in videos, we can greatly improve the accuracy and robustness of a visual recognition system. On the other hand, as this research area is still in its infancy, developing an end-to-end face processing pipeline that can robustly detect, track and recognize faces remains a challenging task. The goal of this dissertation is to study some of the related problems under different settings. We address the video-based face association problem, in which one attempts to extract face tracks of multiple subjects while maintaining label consistency. Traditional tracking algorithms have difficulty in handling this task, especially when challenging nuisance factors like motion blur, low resolution or significant camera motions are present. We demonstrate that contextual features, in addition to face appearance itself, play an important role in this case. We propose principled methods to combine multiple features using Conditional Random Fields and Max-Margin Markov networks to infer labels for the detected faces. Different from many existing approaches, our algorithms work in online mode and hence have a wider range of applications. We address issues such as parameter learning, inference and handling false positves/negatives that arise in the proposed approach. Finally, we evaluate our approach on several public databases. We next propose a novel video-based face recognition framework. We address the problem from two different aspects: To handle pose variations, we learn a Structural-SVM based detector which can simultaneously localize the face fiducial points and estimate the face pose. By adopting a different optimization criterion from existing algorithms, we are able to improve localization accuracy. To model other face variations, we use intra-personal/extra-personal dictionaries. The intra-personal/extra-personal modeling of human faces has been shown to work successfully in the Bayesian face recognition framework. It has additional advantages in scalability and generalization, which are of critical importance to real-world applications. Combining intra-personal/extra-personal models with dictionary learning enables us to achieve state-of-arts performance on unconstrained video data, even when the training data come from a different database. Finally, we present an approach for video-based face recognition using camera networks. The focus is on handling pose variations by applying the strength of the multi-view camera network. However, rather than taking the typical approach of modeling these variations, which eventually requires explicit knowledge about pose parameters, we rely on a pose-robust feature that eliminates the needs for pose estimation. The pose-robust feature is developed using the Spherical Harmonic (SH) representation theory. It is extracted using the surface texture map of a spherical model which approximates the subject's head. Feature vectors extracted from a video are modeled as an ensemble of instances of a probability distribution in the Reduced Kernel Hilbert Space (RKHS). The ensemble similarity measure in RKHS improves both robustness and accuracy of the recognition system. The proposed approach outperforms traditional algorithms on a multi-view video database collected using a camera network

    Physics based supervised and unsupervised learning of graph structure

    Get PDF
    Graphs are central tools to aid our understanding of biological, physical, and social systems. Graphs also play a key role in representing and understanding the visual world around us, 3D-shapes and 2D-images alike. In this dissertation, I propose the use of physical or natural phenomenon to understand graph structure. I investigate four phenomenon or laws in nature: (1) Brownian motion, (2) Gauss\u27s law, (3) feedback loops, and (3) neural synapses, to discover patterns in graphs

    Decentralized Riemannian Particle Filtering with Applications to Multi-Agent Localization

    Get PDF
    The primary focus of this research is to develop consistent nonlinear decentralized particle filtering approaches to the problem of multiple agent localization. A key aspect in our development is the use of Riemannian geometry to exploit the inherently non-Euclidean characteristics that are typical when considering multiple agent localization scenarios. A decentralized formulation is considered due to the practical advantages it provides over centralized fusion architectures. Inspiration is taken from the relatively new field of information geometry and the more established research field of computer vision. Differential geometric tools such as manifolds, geodesics, tangent spaces, exponential, and logarithmic mappings are used extensively to describe probabilistic quantities. Numerous probabilistic parameterizations were identified, settling on the efficient square-root probability density function parameterization. The square-root parameterization has the benefit of allowing filter calculations to be carried out on the well studied Riemannian unit hypersphere. A key advantage for selecting the unit hypersphere is that it permits closed-form calculations, a characteristic that is not shared by current solution approaches. Through the use of the Riemannian geometry of the unit hypersphere, we are able to demonstrate the ability to produce estimates that are not overly optimistic. Results are presented that clearly show the ability of the proposed approaches to outperform current state-of-the-art decentralized particle filtering methods. In particular, results are presented that emphasize the achievable improvement in estimation error, estimator consistency, and required computational burden

    Medical ultrasonic tomographic system

    Get PDF
    An electro-mechanical scanning assembly was designed and fabricated for the purpose of generating an ultrasound tomogram. A low cost modality was demonstrated in which analog instrumentation methods formed a tomogram on photographic film. Successful tomogram reconstructions were obtained on in vitro test objects by using the attenuation of the fist path ultrasound signal as it passed through the test object. The nearly half century tomographic methods of X-ray analysis were verified as being useful for ultrasound imaging

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework
    corecore