7,415 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework
This paper proposes a novel joint multi-speaker tracking-and-separation
method based on the generalized labeled multi-Bernoulli (GLMB) multi-target
tracking filter, using sound mixtures recorded by microphones. Standard
multi-speaker tracking algorithms usually only track speaker locations, and
ambiguity occurs when speakers are spatially close. The proposed multi-feature
GLMB tracking filter treats the set of vectors of associated speaker features
(location, pitch and sound) as the multi-target multi-feature observation,
characterizes transitioning features with corresponding transition models and
overall likelihood function, thus jointly tracks and separates each
multi-feature speaker, and addresses the spatial ambiguity problem. Numerical
evaluation verifies that the proposed method can correctly track locations of
multiple speakers and meanwhile separate speech signals
- …