350 research outputs found
Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification
The present paper introduces a novel speaker modeling technique for text-independent speaker identification using probabilistic self-organizing maps (PbSOMs). The basic motivation behind the introduced technique was to combine the self-organizing quality of the self-organizing maps and generative power of Gaussian mixture models. Experimental results show that the introduced modeling technique using probabilistic self-organizing maps significantly outperforms the traditional technique using the classical GMMs and the EM algorithm or its deterministic variant. More precisely, a relative accuracy improvement of roughly 39% has been gained, as well as, a much less sensitivity to the model-parameters initialization has been exhibited by using the introduced speaker modeling technique using probabilistic self-organizing maps
Implicitly Supervised Language Model Adaptation for Meeting Transcription
We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9 % and word error rate by 4 % relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.
A detection-based pattern recognition framework and its applications
The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation.
Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages.
A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts
can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage.
This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed
in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min
Hybrid training approaches to Hidden Markov Model-based acoustic models for automatic speech recognition
Doctor of Philosoph
The information bottleneck method
We define the relevant information in a signal as being the
information that this signal provides about another signal y\in \Y. Examples
include the information that face images provide about the names of the people
portrayed, or the information that speech sounds provide about the words
spoken. Understanding the signal requires more than just predicting , it
also requires specifying which features of \X play a role in the prediction.
We formalize this problem as that of finding a short code for \X that
preserves the maximum information about \Y. That is, we squeeze the
information that \X provides about \Y through a `bottleneck' formed by a
limited set of codewords \tX. This constrained optimization problem can be
seen as a generalization of rate distortion theory in which the distortion
measure d(x,\x) emerges from the joint statistics of \X and \Y. This
approach yields an exact set of self consistent equations for the coding rules
X \to \tX and \tX \to \Y. Solutions to these equations can be found by a
convergent re-estimation method that generalizes the Blahut-Arimoto algorithm.
Our variational principle provides a surprisingly rich framework for discussing
a variety of problems in signal processing and learning, as will be described
in detail elsewhere
Digital Signal Processing Research Program
Contains table of contents for Section 2, an introduction and reports on fourteen research projects.U.S. Navy - Office of Naval Research Grant N00014-91-J-1628Defense Advanced Research Projects Agency/U.S. Navy - Office of Naval Research Grant N00014-89-J-1489MIT - Woods Hole Oceanographic Institution Joint ProgramLockheed Sanders, Inc./U.S. Navy Office of Naval Research Contract N00014-91-C-0125U.S. Air Force - Office of Scientific Research Grant AFOSR-91-0034U.S. Navy - Office of Naval Research Grant N00014-91-J-1628AT&T Laboratories Doctoral Support ProgramNational Science Foundation Fellowshi
- …