Search CORE

172 research outputs found

Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference

Author: Franz Pernkopf
Michael Wohlmayr
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems

Author: Amari
Bishop
Blaise Thomson
Boyen
Heskes
Horvitz
Kaelbling
Kschischang
Levin
Meng
Peters
Pieraccini
Pietquin
Steve Young
Sutton
Sutton
Walker
Williams
Williams
Williams
Yedidia
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recommended from our members

Towards Single-Channel Unsupervised Source Separation of Speech Mixtures: The Layered Harmonics/Formants Separation-Tracking Model

Author: Ellis Daniel P. W.
Jojic Nebojsa
Reyes-Gomez Manuel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

Speaker models for blind source separation are typically based on HMMs consisting of vast numbers of states to capture source spectral variation, and trained on large amounts of isolated speech. Since observations can be similar between sources, inference relies on sequential constraints from the state transition matrix which are, however, quite weak. To avoid these problems, we propose a strategy of capturing local deformations of the time-frequency energy distribution. Since consecutive spectral frames are highly correlated, each frame can be accurately described as a nonuniform deformation of its predecessor. A smooth pattern of deformations is indicative of a single speaker, and the cliffs in the deformation fields may indicate a speaker switch. Further, the log-spectrum of speech can be decomposed into two additive layers, separately describing the harmonics and formant structure. We model smooth deformations as hidden transformation variables in both layers, using MRFs with overlapping subwindows as observations, assumed to be a noisy sum of the two layers. Loopy belief propagation provides for efficient inference. Without any pre-trained speech or speaker models, this approach can be used to fill in missing time-frequency observations, and the local entropy of the deformation fields indicate source boundaries for separation

Columbia University Academic Commons

Deformable Spectrograms

Author: Ellis Daniel P. W.
Jojic Nebojsa
Reyes-Gomez Manuel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Speech and other natural sounds show high temporal correlation and smooth spectral evolution punctuated by a few, irregular and abrupt changes. In a conventional Hidden Markov Model (HMM), such structure is represented weakly and indirectly through transitions between explicit states representing 'steps' along such smooth changes. It would be more efficient and informative to model successive spectra as transformations of their immediate predecessors, and we present a model which focuses on local deformations of adjacent bins in a time-frequency surface to explain an observed sound, using explicit representation only for those bins that cannot be predicted from their context. We further decompose the log-spectrum into two additive layers, which are able to separately explain and model the evolution of the harmonic excitation, and formant filtering of speech and similar sounds. Smooth deformations are modeled with hidden transformation variables in both layers, using Markov Random Fields (MRFs) with overlapping subwindows as observations; inference is efficiently performed via loopy belief propagation. The model can fill-in deleted time-frequency cells without any signal model, and an entire signal can be compactly represented with a few specific states along with the deformation maps for both layers. We discuss several possible applications for this new model, including source separation

CiteSeerX

Columbia University Academic Commons

Merging Belief Propagation and the Mean Field Approximation: A Free Energy Approach

Author: Badiu Mihai-Alin
Fleury Bernard Henry
Kirkelund Gunvor Elisabeth
Manchón Carles Navarro
Riegler Erwin
Publication venue
Publication date: 28/06/2012
Field of study

We present a joint message passing approach that combines belief propagation and the mean field approximation. Our analysis is based on the region-based free energy approximation method proposed by Yedidia et al. We show that the message passing fixed-point equations obtained with this combination correspond to stationary points of a constrained region-based free energy approximation. Moreover, we present a convergent implementation of these message passing fixedpoint equations provided that the underlying factor graph fulfills certain technical conditions. In addition, we show how to include hard constraints in the part of the factor graph corresponding to belief propagation. Finally, we demonstrate an application of our method to iterative channel estimation and decoding in an orthogonal frequency division multiplexing (OFDM) system

arXiv.org e-Print Archive

VBN

Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis

Author: Brown G.J.
Kolossa D.
Ma N.
Nicheli R.
Zeiler S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2016
Field of study

© 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-machine interaction, but it is still not sufficiently reliable when used under highly noisy or reverberant conditions. One option for achieving far greater robustness is to include another modality that is unaffected by acoustic noise, such as video information. Currently the most successful approaches for such audiovisual ASR systems, coupled hidden Markov models (HMMs) and turbo decoding, both allow for slight asynchrony between audio and video features, and significantly improve recognition rates in this way. However, both typically still neglect residual errors in the estimation of audio features, so-called observation uncertainties. This paper compares two strategies for adding these observation uncertainties into the decoder, and shows that significant recognition rate improvements are achievable for both coupled HMMs and turbo decoding

Crossref

White Rose Research Online