54,247 research outputs found
A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition
This article provides a unifying Bayesian network view on various approaches
for acoustic model adaptation, missing feature, and uncertainty decoding that
are well-known in the literature of robust automatic speech recognition. The
representatives of these classes can often be deduced from a Bayesian network
that extends the conventional hidden Markov models used in speech recognition.
These extensions, in turn, can in many cases be motivated from an underlying
observation model that relates clean and distorted feature vectors. By
converting the observation models into a Bayesian network representation, we
formulate the corresponding compensation rules leading to a unified view on
known derivations as well as to new formulations for certain approaches. The
generic Bayesian perspective provided in this contribution thus highlights
structural differences and similarities between the analyzed approaches
Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees
This paper addresses the problem of ad hoc microphone array calibration where
only partial information about the distances between microphones is available.
We construct a matrix consisting of the pairwise distances and propose to
estimate the missing entries based on a novel Euclidean distance matrix
completion algorithm by alternative low-rank matrix completion and projection
onto the Euclidean distance space. This approach confines the recovered matrix
to the EDM cone at each iteration of the matrix completion algorithm. The
theoretical guarantees of the calibration performance are obtained considering
the random and locally structured missing entries as well as the measurement
noise on the known distances. This study elucidates the links between the
calibration error and the number of microphones along with the noise level and
the ratio of missing distances. Thorough experiments on real data recordings
and simulated setups are conducted to demonstrate these theoretical insights. A
significant improvement is achieved by the proposed Euclidean distance matrix
completion algorithm over the state-of-the-art techniques for ad hoc microphone
array calibration.Comment: In Press, available online, August 1, 2014.
http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal
Processing, 201
Normalization of Dutch user-generated content
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work
Predicting Remaining Useful Life using Time Series Embeddings based on Recurrent Neural Networks
We consider the problem of estimating the remaining useful life (RUL) of a
system or a machine from sensor data. Many approaches for RUL estimation based
on sensor data make assumptions about how machines degrade. Additionally,
sensor data from machines is noisy and often suffers from missing values in
many practical settings. We propose Embed-RUL: a novel approach for RUL
estimation from sensor data that does not rely on any degradation-trend
assumptions, is robust to noise, and handles missing values. Embed-RUL utilizes
a sequence-to-sequence model based on Recurrent Neural Networks (RNNs) to
generate embeddings for multivariate time series subsequences. The embeddings
for normal and degraded machines tend to be different, and are therefore found
to be useful for RUL estimation. We show that the embeddings capture the
overall pattern in the time series while filtering out the noise, so that the
embeddings of two machines with similar operational behavior are close to each
other, even when their sensor readings have significant and varying levels of
noise content. We perform experiments on publicly available turbofan engine
dataset and a proprietary real-world dataset, and demonstrate that Embed-RUL
outperforms the previously reported state-of-the-art on several metrics.Comment: Presented at 2nd ML for PHM Workshop at SIGKDD 2017, Halifax, Canad
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
- …