75,300 research outputs found
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Towards End-to-End Acoustic Localization using Deep Learning: from Audio Signal to Source Position Coordinates
This paper presents a novel approach for indoor acoustic source localization
using microphone arrays and based on a Convolutional Neural Network (CNN). The
proposed solution is, to the best of our knowledge, the first published work in
which the CNN is designed to directly estimate the three dimensional position
of an acoustic source, using the raw audio signal as the input information
avoiding the use of hand crafted audio features. Given the limited amount of
available localization data, we propose in this paper a training strategy based
on two steps. We first train our network using semi-synthetic data, generated
from close talk speech recordings, and where we simulate the time delays and
distortion suffered in the signal that propagates from the source to the array
of microphones. We then fine tune this network using a small amount of real
data. Our experimental results show that this strategy is able to produce
networks that significantly improve existing localization methods based on
\textit{SRP-PHAT} strategies. In addition, our experiments show that our CNN
method exhibits better resistance against varying gender of the speaker and
different window sizes compared with the other methods.Comment: 18 pages, 3 figures, 8 table
Dynamic imaging of coherent sources reveals different network connectivity underlying the generation and perpetuation of epileptic seizures
The concept of focal epilepsies includes a seizure origin in brain regions with hyper synchronous activity (epileptogenic zone and seizure onset zone) and a complex epileptic network of different brain areas involved in the generation, propagation, and modulation of seizures. The purpose of this work was to study functional and effective connectivity between regions involved in networks of epileptic seizures. The beginning and middle part of focal seizures from ictal surface EEG data were analyzed using dynamic imaging of coherent sources (DICS), an inverse solution in the frequency domain which describes neuronal networks and coherences of oscillatory brain activities. The information flow (effective connectivity) between coherent sources was investigated using the renormalized partial directed coherence (RPDC) method. In 8/11 patients, the first and second source of epileptic activity as found by DICS were concordant with the operative resection site; these patients became seizure free after epilepsy surgery. In the remaining 3 patients, the results of DICS / RPDC calculations and the resection site were discordant; these patients had a poorer post-operative outcome. The first sources as found by DICS were located predominantly in cortical structures; subsequent sources included some subcortical structures: thalamus, Nucl. Subthalamicus and cerebellum. DICS seems to be a powerful tool to define the seizure onset zone and the epileptic networks involved. Seizure generation seems to be related to the propagation of epileptic activity from the primary source in the seizure onset zone, and maintenance of seizures is attributed to the perpetuation of epileptic activity between nodes in the epileptic network. Despite of these promising results, this proof of principle study needs further confirmation prior to the use of the described methods in the clinical praxis
Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks
The propagation of sound in a shallow water environment is characterized by
boundary reflections from the sea surface and sea floor. These reflections
result in multiple (indirect) sound propagation paths, which can degrade the
performance of passive sound source localization methods. This paper proposes
the use of convolutional neural networks (CNNs) for the localization of sources
of broadband acoustic radiated noise (such as motor vessels) in shallow water
multipath environments. It is shown that CNNs operating on cepstrogram and
generalized cross-correlogram inputs are able to more reliably estimate the
instantaneous range and bearing of transiting motor vessels when the source
localization performance of conventional passive ranging methods is degraded.
The ensuing improvement in source localization performance is demonstrated
using real data collected during an at-sea experiment.Comment: 5 pages, 5 figures, Final draft of paper submitted to 2018 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
15-20 April 2018 in Calgary, Alberta, Canada. arXiv admin note: text overlap
with arXiv:1612.0350
- …