Search CORE

2,059 research outputs found

Investigating NMF Speech Enhancement for Neural Network based Acoustic Models

Author: Björn Schuller
Gerhard Rigoll
Jort F. Gemmeke
Jürgen T. Geiger
Publication venue
Publication date
Field of study

In the light of the improvements that were made in the last years with neural network-based acoustic models, it is an interesting question whether these models are also suited for noise-robust recognition. This has not yet been fully explored, although first experiments confirm this question. Furthermore, preprocessing techniques that improve the robustness should be re-evaluated with these new models. In this work, we present experimental results to address these questions. Acoustic models based on Gaussian mixture models (GMMs), deep neural networks (DNNs), and long short-term memory (LSTM) recurrent neural networks (which have an improved ability to exploit context) are evaluated for their robustness after clean or multi-condition training. In addition, the influence of non-negative matrix factorization (NMF) for speech enhancement is investigated. Experiments are performed with the Aurora-4 database and the results show that DNNs perform slightly better than LSTMs and, as expected, both beat GMMs. Furthermore, speech enhancement is capable of improving the DNN result. Index Terms: robust speech recognition, long short-term memory, speech enhancemen

CiteSeerX

ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning

Author: Barth Volker
Bergler Christian
Cheng Rachael Xi
Hofer Heribert
Maier Andreas
Nöth Elmar
Schröter Hendrik
Weber Michael
Publication venue
Publication date: 01/01/2019
Field of study

Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis – particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository – the Orchive – comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species

Institutional Repository of the Freie Universität Berlin

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification

Author: Dehak Najim
Sibbern Frederiksen Peter
Tan Zheng-Hua
Villalba Jesus
Watanabe Shinji
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2018
Field of study

Crossref

VBN

Normalization and Transformation Techniques for Robust Speaker Recognition

Author: Baojie Li
Dalei Wu
Hui Jiang
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Face Recognition in Ideal and Noisy Conditions Using Support Vector Machines, PCA and LDA

Author: Fedor Lehocki
Jan Mazanec
Jarmila Pavlovicova
Milos Oravec
Pavel Eiben
Publication venue: 'IntechOpen'
Publication date: 01/04/2010
Field of study

IntechOpen

Multi-candidate missing data imputation for robust speech recognition

Author: Hugo Van hamme
Yujun Wang
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT. © 2012 Wang and Van hamme; licensee Springer.Wang Y., Van hamme H., ''Multi-candidate missing data imputation for robust speech recognition'', EURASIP journal on audio, speech, and music processing, vol. 17, 20 pp., 2012.status: publishe

Lirias

Springer - Publisher Connector

A Comprehensive Noise Robust Speech Parameterization Algorithm Using Wavelet Packet Decomposition-Based Denoising and Speech Feature Representation Techniques

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector