2,324 research outputs found
Text-independent speaker recognition for Ambient Intelligence applications by using Information Set Features
Biometric systems are enabling technologies for a wide set of applications in Ambient Intelligence (AmI) environments. In this context, speaker recognition techniques are of paramount importance due to their high user acceptance and low required cooperation. Typical applications of biometric recognition in AmI environments are identification techniques designed to recognize individuals in small datasets. Biometric recognition methods are frequently deployed on embedded hardware and therefore need to be optimized in terms of computational time as well as used memory. This paper presents a text-independent speaker recognition method particularly suitable for identification in AmI environments. The proposed method first computes the Mel Frequency Cepstral Coefficients (MFCC) and then creates Information Set Features (ISF) by applying a fuzzy logic approach. Finally, it estimates the user's identity by using a hierarchical classification technique based on computational intelligence. We evaluated the performance of the speaker recognition method using signals belonging to the NIST-2003 switchboard speaker database. The achieved results showed that the proposed method reduced the size of the template with respect to traditional approaches based on Gaussian Mixture Models (GMM) and achieved better identification accuracy
A Constructive, Incremental-Learning Network for Mixture Modeling and Classification
Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409
A Constructive, Incremental-Learning Network for Mixture Modeling and Classification
Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409
Who Spoke What? A Latent Variable Framework for the Joint Decoding of Multiple Speakers and their Keywords
In this paper, we present a latent variable (LV) framework to identify all
the speakers and their keywords given a multi-speaker mixture signal. We
introduce two separate LVs to denote active speakers and the keywords uttered.
The dependency of a spoken keyword on the speaker is modeled through a
conditional probability mass function. The distribution of the mixture signal
is expressed in terms of the LV mass functions and speaker-specific-keyword
models. The proposed framework admits stochastic models, representing the
probability density function of the observation vectors given that a particular
speaker uttered a specific keyword, as speaker-specific-keyword models. The LV
mass functions are estimated in a Maximum Likelihood framework using the
Expectation Maximization (EM) algorithm. The active speakers and their keywords
are detected as modes of the joint distribution of the two LVs. In mixture
signals, containing two speakers uttering the keywords simultaneously, the
proposed framework achieves an accuracy of 82% for detecting both the speakers
and their respective keywords, using Student's-t mixture models as
speaker-specific-keyword models.Comment: 6 pages, 2 figures Submitted to : IEEE Signal Processing Letter
- …