172 research outputs found

    Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference

    Full text link

    Deformable Spectrograms

    Get PDF
    Speech and other natural sounds show high temporal correlation and smooth spectral evolution punctuated by a few, irregular and abrupt changes. In a conventional Hidden Markov Model (HMM), such structure is represented weakly and indirectly through transitions between explicit states representing 'steps' along such smooth changes. It would be more efficient and informative to model successive spectra as transformations of their immediate predecessors, and we present a model which focuses on local deformations of adjacent bins in a time-frequency surface to explain an observed sound, using explicit representation only for those bins that cannot be predicted from their context. We further decompose the log-spectrum into two additive layers, which are able to separately explain and model the evolution of the harmonic excitation, and formant filtering of speech and similar sounds. Smooth deformations are modeled with hidden transformation variables in both layers, using Markov Random Fields (MRFs) with overlapping subwindows as observations; inference is efficiently performed via loopy belief propagation. The model can fill-in deleted time-frequency cells without any signal model, and an entire signal can be compactly represented with a few specific states along with the deformation maps for both layers. We discuss several possible applications for this new model, including source separation

    Merging Belief Propagation and the Mean Field Approximation: A Free Energy Approach

    Get PDF
    We present a joint message passing approach that combines belief propagation and the mean field approximation. Our analysis is based on the region-based free energy approximation method proposed by Yedidia et al. We show that the message passing fixed-point equations obtained with this combination correspond to stationary points of a constrained region-based free energy approximation. Moreover, we present a convergent implementation of these message passing fixedpoint equations provided that the underlying factor graph fulfills certain technical conditions. In addition, we show how to include hard constraints in the part of the factor graph corresponding to belief propagation. Finally, we demonstrate an application of our method to iterative channel estimation and decoding in an orthogonal frequency division multiplexing (OFDM) system

    Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis

    Get PDF
    © 2016 IEEE.Automatic speech recognition (ASR) has become a widespread and convenient mode of human-machine interaction, but it is still not sufficiently reliable when used under highly noisy or reverberant conditions. One option for achieving far greater robustness is to include another modality that is unaffected by acoustic noise, such as video information. Currently the most successful approaches for such audiovisual ASR systems, coupled hidden Markov models (HMMs) and turbo decoding, both allow for slight asynchrony between audio and video features, and significantly improve recognition rates in this way. However, both typically still neglect residual errors in the estimation of audio features, so-called observation uncertainties. This paper compares two strategies for adding these observation uncertainties into the decoder, and shows that significant recognition rate improvements are achievable for both coupled HMMs and turbo decoding
    • …
    corecore