121 research outputs found

    Face and Body gesture recognition for a vision-based multimodal analyser

    Full text link
    users, computers should be able to recognize emotions, by analyzing the human's affective state, physiology and behavior. In this paper, we present a survey of research conducted on face and body gesture and recognition. In order to make human-computer interfaces truly natural, we need to develop technology that tracks human movement, body behavior and facial expression, and interprets these movements in an affective way. Accordingly in this paper, we present a framework for a vision-based multimodal analyzer that combines face and body gesture and further discuss relevant issues

    Conditional Random Fields for Integrating Local Discriminative Classifiers

    Full text link

    An autopoietic approach to the development of speech recognition (pendekatan autopoietic dalam pembangunan pengecaman suara)

    Get PDF
    The focus of research here is on the implementation of speech recognition through an autopoietic approach. The work done here has culminated in the introduction of a neural network architecture named Homunculus Network. This network was used in the development of a speech recognition system for Bahasa Melayu. The speech recognition system is an isolated-word, phoneme-level speech recognizer that is speaker independent and has a vocabulary of 15 words. The research done has identified some issues worth further work later. These issues are also the basis for the design and the development of the new autopoietic speech recognition system

    Towards Realistic Facial Expression Recognition

    Get PDF
    Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods

    Efficient Learning Machines

    Get PDF
    Computer scienc

    Exploring variabilities through factor analysis in automatic acoustic language recognition

    Get PDF
    La problématique traitée par la Reconnaissance de la Langue (LR) porte sur la définition découverte de la langue contenue dans un segment de parole. Cette thèse se base sur des paramètres acoustiques de courte durée, utilisés dans une approche d adaptation de mélanges de Gaussiennes (GMM-UBM). Le problème majeur de nombreuses applications du vaste domaine de la re- problème connaissance de formes consiste en la variabilité des données observées. Dans le contexte de la Reconnaissance de la Langue (LR), cette variabilité nuisible est due à des causes diverses, notamment les caractéristiques du locuteur, l évolution de la parole et de la voix, ainsi que les canaux d acquisition et de transmission. Dans le contexte de la reconnaissance du locuteur, l impact de la variabilité solution peut sensiblement être réduit par la technique d Analyse Factorielle (Joint Factor Analysis, JFA). Dans ce travail, nous introduisons ce paradigme à la Reconnaissance de la Langue. Le succès de la JFA repose sur plusieurs hypothèses. La première est que l information observée est décomposable en une partie universelle, une partie dépendante de la langue et une partie de variabilité, qui elle est indépendante de la langue. La deuxième hypothèse, plus technique, est que la variabilité nuisible se situe dans un sous-espace de faible dimension, qui est défini de manière globale.Dans ce travail, nous analysons le comportement de la JFA dans le contexte d un dispositif de LR du type GMM-UBM. Nous introduisons et analysons également sa combinaison avec des Machines à Vecteurs Support (SVM). Les premières publications sur la JFA regroupaient toute information qui est amélioration nuisible à la tâche (donc ladite variabilité) dans un seul composant. Celui-ci est supposé suivre une distribution Gaussienne. Cette approche permet de traiter les différentes sortes de variabilités d une manière unique. En pratique, nous observons que cette hypothèse n est pas toujours vérifiée. Nous avons, par exemple, le cas où les données peuvent être groupées de manière logique en deux sous-parties clairement distinctes, notamment en données de sources téléphoniques et d émissions radio. Dans ce cas-ci, nos recherches détaillées montrent un certain avantage à traiter les deux types de données par deux systèmes spécifiques et d élire comme score de sortie celui du système qui correspond à la catégorie source du segment testé. Afin de sélectionner le score de l un des systèmes, nous avons besoin d un analyses détecteur de canal source. Nous proposons ici différents nouveaux designs pour engendrées de tels détecteurs automatiques. Dans ce cadre, nous montrons que les facteurs de variabilité (du sous-espace) de la JFA peuvent être utilisés avec succès pour la détection de la source. Ceci ouvre la perspective intéressante de subdiviser les5données en catégories de canal source qui sont établies de manière automatique. En plus de pouvoir s adapter à des nouvelles conditions de source, cette propriété permettrait de pouvoir travailler avec des données d entraînement qui ne sont pas accompagnées d étiquettes sur le canal de source. L approche JFA permet une réduction de la mesure de coûts allant jusqu à généraux 72% relatives, comparé au système GMM-UBM de base. En utilisant des systèmes spécifiques à la source, suivis d un sélecteur de scores, nous obtenons une amélioration relative de 81%.Language Recognition is the problem of discovering the language of a spoken definitionutterance. This thesis achieves this goal by using short term acoustic information within a GMM-UBM approach.The main problem of many pattern recognition applications is the variability of problemthe observed data. In the context of Language Recognition (LR), this troublesomevariability is due to the speaker characteristics, speech evolution, acquisition and transmission channels.In the context of Speaker Recognition, the variability problem is solved by solutionthe Joint Factor Analysis (JFA) technique. Here, we introduce this paradigm toLanguage Recognition. The success of JFA relies on several assumptions: The globalJFA assumption is that the observed information can be decomposed into a universalglobal part, a language-dependent part and the language-independent variabilitypart. The second, more technical assumption consists in the unwanted variability part to be thought to live in a low-dimensional, globally defined subspace. In this work, we analyze how JFA behaves in the context of a GMM-UBM LR framework. We also introduce and analyze its combination with Support Vector Machines(SVMs).The first JFA publications put all unwanted information (hence the variability) improvemen tinto one and the same component, which is thought to follow a Gaussian distribution.This handles diverse kinds of variability in a unique manner. But in practice,we observe that this hypothesis is not always verified. We have for example thecase, where the data can be divided into two clearly separate subsets, namely datafrom telephony and from broadcast sources. In this case, our detailed investigations show that there is some benefit of handling the two kinds of data with two separatesystems and then to elect the output score of the system, which corresponds to the source of the testing utterance.For selecting the score of one or the other system, we need a channel source related analyses detector. We propose here different novel designs for such automatic detectors.In this framework, we show that JFA s variability factors (of the subspace) can beused with success for detecting the source. This opens the interesting perspectiveof partitioning the data into automatically determined channel source categories,avoiding the need of source-labeled training data, which is not always available.The JFA approach results in up to 72% relative cost reduction, compared to the overall resultsGMM-UBM baseline system. Using source specific systems followed by a scoreselector, we achieve 81% relative improvement.AVIGNON-Bib. numérique (840079901) / SudocSudocFranceF

    顔表情からの情動推定とストレス評価への適用に関する研究

    Get PDF
    国立大学法人長岡技術科学大

    A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

    Get PDF
    When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

    A Statistical Approach to the Alignment of fMRI Data

    Get PDF
    Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods
    corecore