350 research outputs found

    Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification

    Get PDF
    The present paper introduces a novel speaker modeling technique for text-independent speaker identification using probabilistic self-organizing maps (PbSOMs). The basic motivation behind the introduced technique was to combine the self-organizing quality of the self-organizing maps and generative power of Gaussian mixture models. Experimental results show that the introduced modeling technique using probabilistic self-organizing maps significantly outperforms the traditional technique using the classical GMMs and the EM algorithm or its deterministic variant. More precisely, a relative accuracy improvement of roughly 39% has been gained, as well as, a much less sensitivity to the model-parameters initialization has been exhibited by using the introduced speaker modeling technique using probabilistic self-organizing maps

    Implicitly Supervised Language Model Adaptation for Meeting Transcription

    Get PDF
    We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9 % and word error rate by 4 % relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.

    A detection-based pattern recognition framework and its applications

    Get PDF
    The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

    The information bottleneck method

    Get PDF
    We define the relevant information in a signal x∈Xx\in X as being the information that this signal provides about another signal y\in \Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal xx requires more than just predicting yy, it also requires specifying which features of \X play a role in the prediction. We formalize this problem as that of finding a short code for \X that preserves the maximum information about \Y. That is, we squeeze the information that \X provides about \Y through a `bottleneck' formed by a limited set of codewords \tX. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x,\x) emerges from the joint statistics of \X and \Y. This approach yields an exact set of self consistent equations for the coding rules X \to \tX and \tX \to \Y. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere

    Digital Signal Processing Research Program

    Get PDF
    Contains table of contents for Section 2, an introduction and reports on fourteen research projects.U.S. Navy - Office of Naval Research Grant N00014-91-J-1628Defense Advanced Research Projects Agency/U.S. Navy - Office of Naval Research Grant N00014-89-J-1489MIT - Woods Hole Oceanographic Institution Joint ProgramLockheed Sanders, Inc./U.S. Navy Office of Naval Research Contract N00014-91-C-0125U.S. Air Force - Office of Scientific Research Grant AFOSR-91-0034U.S. Navy - Office of Naval Research Grant N00014-91-J-1628AT&T Laboratories Doctoral Support ProgramNational Science Foundation Fellowshi
    • …
    corecore