2,324 research outputs found

    Text-independent speaker recognition for Ambient Intelligence applications by using Information Set Features

    Get PDF
    Biometric systems are enabling technologies for a wide set of applications in Ambient Intelligence (AmI) environments. In this context, speaker recognition techniques are of paramount importance due to their high user acceptance and low required cooperation. Typical applications of biometric recognition in AmI environments are identification techniques designed to recognize individuals in small datasets. Biometric recognition methods are frequently deployed on embedded hardware and therefore need to be optimized in terms of computational time as well as used memory. This paper presents a text-independent speaker recognition method particularly suitable for identification in AmI environments. The proposed method first computes the Mel Frequency Cepstral Coefficients (MFCC) and then creates Information Set Features (ISF) by applying a fuzzy logic approach. Finally, it estimates the user's identity by using a hierarchical classification technique based on computational intelligence. We evaluated the performance of the speaker recognition method using signals belonging to the NIST-2003 switchboard speaker database. The achieved results showed that the proposed method reduced the size of the template with respect to traditional approaches based on Gaussian Mixture Models (GMM) and achieved better identification accuracy

    A Constructive, Incremental-Learning Network for Mixture Modeling and Classification

    Full text link
    Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409

    A Constructive, Incremental-Learning Network for Mixture Modeling and Classification

    Full text link
    Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409

    Methods for fast and reliable clustering

    Get PDF

    Who Spoke What? A Latent Variable Framework for the Joint Decoding of Multiple Speakers and their Keywords

    Full text link
    In this paper, we present a latent variable (LV) framework to identify all the speakers and their keywords given a multi-speaker mixture signal. We introduce two separate LVs to denote active speakers and the keywords uttered. The dependency of a spoken keyword on the speaker is modeled through a conditional probability mass function. The distribution of the mixture signal is expressed in terms of the LV mass functions and speaker-specific-keyword models. The proposed framework admits stochastic models, representing the probability density function of the observation vectors given that a particular speaker uttered a specific keyword, as speaker-specific-keyword models. The LV mass functions are estimated in a Maximum Likelihood framework using the Expectation Maximization (EM) algorithm. The active speakers and their keywords are detected as modes of the joint distribution of the two LVs. In mixture signals, containing two speakers uttering the keywords simultaneously, the proposed framework achieves an accuracy of 82% for detecting both the speakers and their respective keywords, using Student's-t mixture models as speaker-specific-keyword models.Comment: 6 pages, 2 figures Submitted to : IEEE Signal Processing Letter
    • …
    corecore