Search CORE

781 research outputs found

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes

Author: Anguera
Baby
Bagchi
Barker
Barker
Castro Martinez
DiBiase
Du
Emmanuel Vincent
Fletcher
Frigge
Fujita
Garofalo
Hermansky
Heymann
Hirsch
Hori
Jalalvand
Jon Barker
Kim
Loesch
Ma
Mestre
Mikolov
Moritz
Mostefa
Parihar
Pfeifenberger
Povey
Prudnikov
Renals
Ricard Marxer
Shinji Watanabe
Sivasankaran
Taal
Tachioka
Taghia
Veselý
Vincent
Vincent
Vu
Yoshioka
Zhao
Zhuang
Publication venue: 'Elsevier BV'
Publication date: 15/10/2016
Field of study

This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations

Crossref

INRIA a CCSD electronic archive server

White Rose Research Online

HAL-Rennes 1

Multi-candidate missing data imputation for robust speech recognition

Author: Hugo Van hamme
Yujun Wang
Publication venue: Springer Nature
Publication date: 01/01/2012
Field of study

The application of Missing Data Techniques (MDT) to increase the noise robustness of HMM/GMM-based large vocabulary speech recognizers is hampered by a large computational burden. The likelihood evaluations imply solving many constrained least squares (CLSQ) optimization problems. As an alternative, researchers have proposed frontend MDT or have made oversimplifying independence assumptions for the backend acoustic model. In this article, we propose a fast Multi-Candidate (MC) approach that solves the per-Gaussian CLSQ problems approximately by selecting the best from a small set of candidate solutions, which are generated as the MDT solutions on a reduced set of cluster Gaussians. Experiments show that the MC MDT runs equally fast as the uncompensated recognizer while achieving the accuracy of the full backend optimization approach. The experiments also show that exploiting the more accurate acoustic model of the backend does pay off in terms of accuracy when compared to frontend MDT. © 2012 Wang and Van hamme; licensee Springer.Wang Y., Van hamme H., ''Multi-candidate missing data imputation for robust speech recognition'', EURASIP journal on audio, speech, and music processing, vol. 17, 20 pp., 2012.status: publishe

Lirias

Springer - Publisher Connector

Development of the Carbon Nanotube Thermoacoustic Loudspeaker

Author: Bouman Troy
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2021
Field of study

Traditional speakers make sound by attaching a coil to a cone and moving that coil back and forth in a magnetic field (aka moving coil loudspeakers). The physics behind how to generate sound via this velocity boundary condition has largely been unchanged for over a hundred years. Interestingly, around the time moving coil loudspeakers were first investigated the idea of using heat to generate sound was also known. These thermoacoustic speakers heat and cool a thin material at acoustic frequencies to generate the pressure wave (i.e. they use a thermal boundary condition). Unfortunately, when the thermoacoustic principle was initially discovered there was no material with the right properties to heat and cool fast enough. Carbon nanotube (CNT) loudspeakers first generated sound early in the 21st century. At that time there were many questions unanswered about their place in the sound generation toolbox of an engineer. The main goal of this dissertation was to continue the development of the CNT loudspeaker with focus on practical usage for an acoustic engineer. Prior to 2014, when this effort began, most of the published development work was from material scientists with objective acoustic performance data presented that was not useful beyond the scope of that particular publication. For example, low sound pressure levels in the nearfield at low power inputs was a common metric. Therefore, this effort had three main objectives with emphasis placed on acquiring data at levels and in nomenclature that would be useful to acoustic engineers so they could bring the technology to market, if adequate. Investigation into the true power efficiency of CNT loudspeakers Investigation into alternative methods to linearize the pressure response of CNT loudspeakers Investigation into the sound quality of CNT loudspeakers Overall, it was found that CNT loudspeakers are approximately four orders of magnitude less power efficient than traditional moving coil loudspeakers. The non-linear pressure output of the CNT loudspeakers can be linearized with a variety of drive signal processing methods, but the selection of which method to use depends on a variety of factors (e.g. amplification architecture available). In general, all methods studied are on the same order of magnitude power efficiency, but the direct current offset and amplitude modulation drive signal processing methods are superior in terms of sound quality

Michigan Technological University

Real-time Hardware Feature Extraction with Embedded Signal Enhancement for Automatic Speech Recognition

Author: Vinh Vu Ngoc
James Whittington
John Devlin
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Algorithms for Source Separation - with Cocktail Party Applications

Author: Olsson Rasmus Kongsgaard
Publication venue
Publication date: 01/11/2007
Field of study

Online Research Database In Technology