286 research outputs found
Missing data mask models with global frequency and temporal constraints
Missing data recognition has been developped in order to increase noise robustness in automatic speech recognition. Many different factors, including the speech decoding process itself, shall be considered to locate the masks. In this work, we are considering Bayesian models of the masks, where every spectral feature is classified as reliable or masked, and is independent from the rest of the signal. This classification strategy can produce unrelated small ``spots'', while experiments suggest that oracle reliable and unreliable features tend to be clustered into time-frequency blocks. We call this undesired effect: the ``checkerboard'' effect. In this paper, we propose a new Bayesian missing data classifier that integrates frequency and temporal constraints in order to reduce, or avoid, this ``checkerboard'' effect. The proposed classifier is evaluated on the Aurora2 connected digit corpora. Integrating such constraints in the missing data classification leads to significant improvements in recognition accuracy
Audio Source Separation with Discriminative Scattering Networks
In this report we describe an ongoing line of research for solving
single-channel source separation problems. Many monaural signal decomposition
techniques proposed in the literature operate on a feature space consisting of
a time-frequency representation of the input data. A challenge faced by these
approaches is to effectively exploit the temporal dependencies of the signals
at scales larger than the duration of a time-frame. In this work we propose to
tackle this problem by modeling the signals using a time-frequency
representation with multiple temporal resolutions. The proposed representation
consists of a pyramid of wavelet scattering operators, which generalizes
Constant Q Transforms (CQT) with extra layers of convolution and complex
modulus. We first show that learning standard models with this multi-resolution
setting improves source separation results over fixed-resolution methods. As
study case, we use Non-Negative Matrix Factorizations (NMF) that has been
widely considered in many audio application. Then, we investigate the inclusion
of the proposed multi-resolution setting into a discriminative training regime.
We discuss several alternatives using different deep neural network
architectures
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
Exploiting primitive grouping constraints for noise robust automatic speech recognition : studies with simultaneous speech.
Significant strides have been made in the field of automatic speech recognition over the past three decades. However, the systems are not robust; their performance degrades in the presence of even moderate amounts of noise. This thesis presents an approach to developing a speech recognition system that takes inspiration firom the approach of human speech recognition
- …