Search CORE

350 research outputs found

A DETERMINISTIC ANNEALING EM ALGORITHM FOR AUTOMATIC MUSIC TRANSCRIPTION

Author: Cheng T
Dixon S
Mauch M
Publication venue
Publication date: 01/01/2013
Field of study

Probabilistic Self-Organizing Maps for Text-Independent Speaker Identification

Author: Bouziane Ayoub
Kharroubi Jamal
Zarghili Arsalane
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/02/2018
Field of study

The present paper introduces a novel speaker modeling technique for text-independent speaker identification using probabilistic self-organizing maps (PbSOMs). The basic motivation behind the introduced technique was to combine the self-organizing quality of the self-organizing maps and generative power of Gaussian mixture models. Experimental results show that the introduced modeling technique using probabilistic self-organizing maps significantly outperforms the traditional technique using the classical GMMs and the EM algorithm or its deterministic variant. More precisely, a relative accuracy improvement of roughly 39% has been gained, as well as, a much less sensitivity to the model-parameters initialization has been exhibited by using the introduced speaker modeling technique using probabilistic self-organizing maps

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Implicitly Supervised Language Model Adaptation for Meeting Transcription

Author: Alexander I. Rudnicky
David Huggins-daines
Publication venue
Publication date: 01/01/2007
Field of study

We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9 % and word error rate by 4 % relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.

CiteSeerX

Crossref

A detection-based pattern recognition framework and its applications

Author: Ma Chengyuan
Publication venue: Georgia Institute of Technology
Publication date: 06/04/2010
Field of study

The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

Scholarly Materials And Research @ Georgia Tech

Hybrid training approaches to Hidden Markov Model-based acoustic models for automatic speech recognition

Author: Huda Shamsul
Publication venue: University of Ballarat
Publication date
Field of study

Doctor of Philosoph

Federation ResearchOnline

The information bottleneck method

Author: Bialek William
Pereira Fernando C.
Tishby Naftali
Publication venue
Publication date: 01/01/1999
Field of study

We define the relevant information in a signal

x\in X

as being the information that this signal provides about another signal y\in \Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal

x

requires more than just predicting

y

, it also requires specifying which features of \X play a role in the prediction. We formalize this problem as that of finding a short code for \X that preserves the maximum information about \Y. That is, we squeeze the information that \X provides about \Y through a `bottleneck' formed by a limited set of codewords \tX. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x,\x) emerges from the joint statistics of \X and \Y. This approach yields an exact set of self consistent equations for the coding rules X \to \tX and \tX \to \Y. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Digital Signal Processing Research Program

Author: Baggeroer Arthur B.
Buck John R.
Catipovic Josko
Cuomo Kevin M.
Dev Bhatta Saurav
Gold Bernard
Isabelle Steven H.
Jachner Jacek
Lam Warren M.
Lee Harry B.
Lynch James F.
Njeru James M.
Oppenheim Alan V.
Papadopoulos Haralabos C.
Perreault Brian M.
Preisig James C.
Richard Michael D.
Scherock Stephen F.
Singer Andrew C.
Verly Jacques
Wage Kathleen E.
Wornell Gregory W.
Zangi Kambiz C.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Section 2, an introduction and reports on fourteen research projects.U.S. Navy - Office of Naval Research Grant N00014-91-J-1628Defense Advanced Research Projects Agency/U.S. Navy - Office of Naval Research Grant N00014-89-J-1489MIT - Woods Hole Oceanographic Institution Joint ProgramLockheed Sanders, Inc./U.S. Navy Office of Naval Research Contract N00014-91-C-0125U.S. Air Force - Office of Scientific Research Grant AFOSR-91-0034U.S. Navy - Office of Naval Research Grant N00014-91-J-1628AT&T Laboratories Doctoral Support ProgramNational Science Foundation Fellowshi

DSpace@MIT