40 research outputs found
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Speech recognition in noisy and channel distorted scenarios is often
challenging as the current acoustic modeling schemes are not adaptive to the
changes in the signal distribution in the presence of noise. In this work, we
develop a novel acoustic modeling framework for noise robust speech recognition
based on relevance weighting mechanism. The relevance weighting is achieved
using a sub-network approach that performs feature selection. A relevance
sub-network is applied on the output of first layer of a convolutional network
model operating on raw speech signals while a second relevance sub-network is
applied on the second convolutional layer output. The relevance weights for the
first layer correspond to an acoustic filterbank selection while the relevance
weights in the second layer perform modulation filter selection. The model is
trained for a speech recognition task on noisy and reverberant speech. The
speech recognition experiments on multiple datasets (Aurora-4, CHiME-3, VOiCES)
reveal that the incorporation of relevance weighting in the neural network
architecture improves the speech recognition word error rates significantly
(average relative improvements of 10% over the baseline systems)Comment: arXiv admin note: text overlap with arXiv:2001.0706
Deep Learning For Sequential Pattern Recognition
Projecte realitzat en el marc d’un programa de mobilitat amb la Technische Universität München
(TUM)In recent years, deep learning has opened a new research line in pattern recognition tasks. It has been hypothesized that this kind of learning would capture more abstract patterns concealed in data. It is motivated by the new findings both in biological aspects of the brain and hardware developments which have made the parallel processing possible. Deep learning methods come along with the conventional algorithms for optimization and training make them efficient for variety of applications in signal processing and pattern recognition. This thesis explores these novel techniques and their related algorithms. It addresses and compares different attributes of these methods, sketches in their possible advantages and disadvantages