4 research outputs found

    Decision-Making with Heterogeneous Sensors - A Copula Based Approach

    Get PDF
    Statistical decision making has wide ranging applications, from communications and signal processing to econometrics and finance. In contrast to the classical one source-one receiver paradigm, several applications have been identified in the recent past that require acquiring data from multiple sources or sensors. Information from the multiple sensors are transmitted to a remotely located receiver known as the fusion center which makes a global decision. Past work has largely focused on fusion of information from homogeneous sensors. This dissertation extends the formulation to the case when the local sensors may possess disparate sensing modalities. Both the theoretical and practical aspects of multimodal signal processing are considered. The first and foremost challenge is to \u27adequately\u27 model the joint statistics of such heterogeneous sensors. We propose the use of copula theory for this purpose. Copula models are general descriptors of dependence. They provide a way to characterize the nonlinear functional relationships between the multiple modalities, which are otherwise difficult to formalize. The important problem of selecting the `best\u27 copula function from a given set of valid copula densities is addressed, especially in the context of binary hypothesis testing problems. Both, the training-testing paradigm, where a training set is assumed to be available for learning the copula models prior to system deployment, as well as generalized likelihood ratio test (GLRT) based fusion rule for the online selection and estimation of copula parameters are considered. The developed theory is corroborated with extensive computer simulations as well as results on real-world data. Sensor observations (or features extracted thereof) are most often quantized before their transmission to the fusion center for bandwidth and power conservation. A detection scheme is proposed for this problem assuming unifom scalar quantizers at each sensor. The designed rule is applicable for both binary and multibit local sensor decisions. An alternative suboptimal but computationally efficient fusion rule is also designed which involves injecting a deliberate disturbance to the local sensor decisions before fusion. The rule is based on Widrow\u27s statistical theory of quantization. Addition of controlled noise helps to \u27linearize\u27 the higly nonlinear quantization process thus resulting in computational savings. It is shown that although the introduction of external noise does cause a reduction in the received signal to noise ratio, the proposed approach can be highly accurate when the input signals have bandlimited characteristic functions, and the number of quantization levels is large. The problem of quantifying neural synchrony using copula functions is also investigated. It has been widely accepted that multiple simultaneously recorded electroencephalographic signals exhibit nonlinear and non-Gaussian statistics. While the existing and popular measures such as correlation coefficient, corr-entropy coefficient, coh-entropy and mutual information are limited to being bivariate and hence applicable only to pairs of channels, measures such as Granger causality, even though multivariate, fail to account for any nonlinear inter-channel dependence. The application of copula theory helps alleviate both these limitations. The problem of distinguishing patients with mild cognitive impairment from the age-matched control subjects is also considered. Results show that the copula derived synchrony measures when used in conjunction with other synchrony measures improve the detection of Alzheimer\u27s disease onset

    AUDIO-VISUAL SPEAKER IDENTIFICATION USING COUPLED HIDDEN MARKOV MODELS

    No full text
    In this paper, we investigate the use of the coupled hidden Markov models (CHMM) for the task of audio-visual text dependent speaker identification. Our system determines the identity of the user from a temporal sequence of audio and visual observations obtained from the acoustic speech and the shape of the mouth, respectively. The multi modal observation sequences are then modeled using a set of CHMMs, one for each phoneme-viseme pair and for each person in the database. The use of CHMMs in our system is justified by the capacity of this model to describe the natural audio and visual state asynchrony as well as their conditional dependency over time. To train a CHMM we first train a speaker independent model using expectationmaximization (EM), and then we build a speaker dependent model using maximum a posteriori (MAP) training. Experimental results on XM2VTS database show that our system improves the accuracy of audio-only or video-only speaker identification at all levels of acoustic signal-to-noise ratio (SNR) from 0 to 30db. 1

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker
    corecore