33 research outputs found
A Method of Combining Multiple Probabilistic Classifiers through Soft Competition on Different Feature Sets
A novel method is proposed for combining multiple probabilistic classifiers on different feature sets. In order to achieve the improved classification performance, a generalized finite mixture model is proposed as a linear combination scheme and implemented based on radial basis function networks. In the linear combination scheme, soft competition on different feature sets is adopted as an automatic feature rank mechanism so that different feature sets can be always simultaneously used in an optimal way to determine linear combination weights. For training the linear combination scheme, a learning algorithm is developed based on Expectation---Maximization (EM) algorithm. The proposed method has been applied to a typical real-world problem, viz., speaker identification, in which different feature sets often need consideration simultaneously for robustness. Simulation results show that the proposed method yields good performance in speaker identification
On use of different feature sets for pattern classification: An alternative method
We propose an alternative method for the use of different feature sets in pattern classification. Unlike traditional methods, e.g. combination of multiple classifiers and use of a composite feature set, our method copes with the problem based on an idea of soft competition on different feature sets. A modular neural network architecture is proposed to implement the idea accordingly. The proposed architecture is interpreted as a generalized finite mixture model and, therefore, parameter estimation is treated as a maximum likelihood problem. An EM algorithm is derived for parameter estimation. Moreover, we propose a heuristic model selection method to fit the proposed architecture to a specific problem. Comparative results are presented for the real world problem of speaker identification.EI
PHMM based asynchronous acoustic model for Chinese large vocabulary continuous speech recognition
In this paper, we presented an asynchronous multiple stream based Chinese tonal acoustic modeling framework. In this framework, toneless phonetic units and tones are modeled separately with different acoustic features. During the training, and decoding process, a set of models are coupled together with a product hidden Markov models (PHMM) to represent whole tonal phonetic units. Through this, a compound context dependent tonal model can be generated from a few simple models. Experiments show that such model scheme generates more compact and accurate model presentation and brings improvement on the performance for large vocabulary speech recognition tasks.AcousticsEngineering, Electrical & ElectronicCPCI-S(ISTP)
Text-Dependent Speaker Identification Based on Input/Output HMMs: An Empirical Study
In this paper, we explore the Input/Output HMM (IOHMM) architecture for a substantial problem, that of text-dependent speaker identification. For subnetworks modeled with generalized linear models, we extend the IRLS algorithm to the M-step of the corresponding EM algorithm. Experimental results show that the improved EM algorithm yields significantly faster training than the original one. In comparison with the multilayer perceptron, the dynamic programming technique and hidden Markov models, we empirically demonstrate that the IOHMM architecture is a promising way to text-dependent speaker identification. Keywords: Speaker Identification, Input/Output HMM, EM algorithm, temporal processing 1 Introduction Speaker identification task is to classify an unlabeled voice token as belonging to one of a set of N reference speakers [1]. It is a very hard problem since a speaker's voice changes in time. There have been extensive studies in this field based upon conventional techniques of spe..
Speaker identification based on the time-delay Hierarchical Mixture of Experts
In this paper, we explore the Hierarchical Mixture of Experts (HME) architecture for a substantial problem, that of text-dependent speaker identification. For a specific multi-way classification, we propose a generalized Bernolli density instead of the multinomial logit density. Time-delay technique is also introduced to HME for spatio-temporal processing. Using the proposed density and the time-delay HME along with the EM algorithm, we show that the system has a satisfactory performance and yields significantly fast training.EI
A modified HME architecture for text-dependent speaker identification
A modified hierarchical mixtures of experts (HME) architecture is presented for text-dependent speaker identification. A new gating network is introduced to the original HME architecture for the use of instantaneous and transitional spectral information in text-dependent speaker identification. The statistical model underlying the proposed architecture is presented and learning is treated as a maximum likelihood problem; in particular, an expectation-maximization (EM) algorithm is also proposed for adjusting the parameters of the proposed architecture. An evaluation has been carried out using a database of isolated digit utterances by 10 male speakers. Experimental results demonstrate that the proposed architecture outperforms the original HME architecture in text-dependent speaker identification.Computer Science, Artificial IntelligenceComputer Science, Hardware & ArchitectureComputer Science, Theory & MethodsEngineering, Electrical & ElectronicSCI(E)EI17ARTICLE51309-1313