Search CORE

75,300 research outputs found

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates

Author: Macias-Guarasa Javier
Pizarro Daniel
Vera-Diaz Juan Manuel
Publication venue: 'MDPI AG'
Publication date: 29/07/2018
Field of study

This paper presents a novel approach for indoor acoustic source localization using microphone arrays and based on a Convolutional Neural Network (CNN). The proposed solution is, to the best of our knowledge, the first published work in which the CNN is designed to directly estimate the three dimensional position of an acoustic source, using the raw audio signal as the input information avoiding the use of hand crafted audio features. Given the limited amount of available localization data, we propose in this paper a training strategy based on two steps. We first train our network using semi-synthetic data, generated from close talk speech recordings, and where we simulate the time delays and distortion suffered in the signal that propagates from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results show that this strategy is able to produce networks that significantly improve existing localization methods based on \textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN method exhibits better resistance against varying gender of the speaker and different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Dynamic imaging of coherent sources reveals different network connectivity underlying the generation and perpetuation of epileptic seizures

Author: Anwar Abdul Rauf
Deuschl Günther
Elshoff Lydia
Muthuraman Muthuraman
Raethjen Jan
Siniatchkin Michael
Stephani Ulrich
Publication venue
Publication date: 01/01/2013
Field of study

The concept of focal epilepsies includes a seizure origin in brain regions with hyper synchronous activity (epileptogenic zone and seizure onset zone) and a complex epileptic network of different brain areas involved in the generation, propagation, and modulation of seizures. The purpose of this work was to study functional and effective connectivity between regions involved in networks of epileptic seizures. The beginning and middle part of focal seizures from ictal surface EEG data were analyzed using dynamic imaging of coherent sources (DICS), an inverse solution in the frequency domain which describes neuronal networks and coherences of oscillatory brain activities. The information flow (effective connectivity) between coherent sources was investigated using the renormalized partial directed coherence (RPDC) method. In 8/11 patients, the first and second source of epileptic activity as found by DICS were concordant with the operative resection site; these patients became seizure free after epilepsy surgery. In the remaining 3 patients, the results of DICS / RPDC calculations and the resection site were discordant; these patients had a poorer post-operative outcome. The first sources as found by DICS were located predominantly in cortical structures; subsequent sources included some subcortical structures: thalamus, Nucl. Subthalamicus and cerebellum. DICS seems to be a powerful tool to define the seizure onset zone and the epileptic networks involved. Seizure generation seems to be related to the propagation of epileptic activity from the primary source in the seizure onset zone, and maintenance of seizures is attributed to the perpetuation of epileptic activity between nodes in the epileptic network. Despite of these promising results, this proof of principle study needs further confirmation prior to the use of the described methods in the clinical praxis

OPUS Augsburg

Crossref

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

FigShare

Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks

Author: Ferguson Eric L.
Jin Craig T.
Williams Stefan B.
Publication venue
Publication date: 26/10/2017
Field of study

The propagation of sound in a shallow water environment is characterized by boundary reflections from the sea surface and sea floor. These reflections result in multiple (indirect) sound propagation paths, which can degrade the performance of passive sound source localization methods. This paper proposes the use of convolutional neural networks (CNNs) for the localization of sources of broadband acoustic radiated noise (such as motor vessels) in shallow water multipath environments. It is shown that CNNs operating on cepstrogram and generalized cross-correlogram inputs are able to more reliably estimate the instantaneous range and bearing of transiting motor vessels when the source localization performance of conventional passive ranging methods is degraded. The ensuing improvement in source localization performance is demonstrated using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap with arXiv:1612.0350

arXiv.org e-Print Archive

Crossref