221,383 research outputs found
A framework for invertible, real-time constant-Q transforms
Audio signal processing frequently requires time-frequency representations
and in many applications, a non-linear spacing of frequency-bands is
preferable. This paper introduces a framework for efficient implementation of
invertible signal transforms allowing for non-uniform and in particular
non-linear frequency resolution. Non-uniformity in frequency is realized by
applying nonstationary Gabor frames with adaptivity in the frequency domain.
The realization of a perfectly invertible constant-Q transform is described in
detail. To achieve real-time processing, independent of signal length,
slice-wise processing of the full input signal is proposed and referred to as
sliCQ transform.
By applying frame theory and FFT-based processing, the presented approach
overcomes computational inefficiency and lack of invertibility of classical
constant-Q transform implementations. Numerical simulations evaluate the
efficiency of the proposed algorithm and the method's applicability is
illustrated by experiments on real-life audio signals
A Generative Product-of-Filters Model of Audio
We propose the product-of-filters (PoF) model, a generative model that
decomposes audio spectra as sparse linear combinations of "filters" in the
log-spectral domain. PoF makes similar assumptions to those used in the classic
homomorphic filtering approach to signal processing, but replaces hand-designed
decompositions built of basic signal processing operations with a learned
decomposition based on statistical inference. This paper formulates the PoF
model and derives a mean-field method for posterior inference and a variational
EM algorithm to estimate the model's free parameters. We demonstrate PoF's
potential for audio processing on a bandwidth expansion task, and show that PoF
can serve as an effective unsupervised feature extractor for a speaker
identification task.Comment: ICLR 2014 conference-track submission. Added link to the source cod
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Adaptive DCTNet for Audio Signal Classification
In this paper, we investigate DCTNet for audio signal classification. Its
output feature is related to Cohen's class of time-frequency distributions. We
introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature
extraction. The A-DCTNet applies the idea of constant-Q transform, with its
center frequencies of filterbanks geometrically spaced. The A-DCTNet is
adaptive to different acoustic scales, and it can better capture low frequency
acoustic information that is sensitive to human audio perception than features
such as Mel-frequency spectral coefficients (MFSC). We use features extracted
by the A-DCTNet as input for classifiers. Experimental results show that the
A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art
performance in bird song classification rate, and improve artist identification
accuracy in music data. They demonstrate A-DCTNet's applicability to signal
processing problems.Comment: International Conference of Acoustic and Speech Signal Processing
(ICASSP). New Orleans, United States, March, 201
On the Mathematics of Music: From Chords to Fourier Analysis
Mathematics is a far reaching discipline and its tools appear in many
applications. In this paper we discuss its role in music and signal processing
by revisiting the use of mathematics in algorithms that can extract chord
information from recorded music. We begin with a light introduction to the
theory of music and motivate the use of Fourier analysis in audio processing.
We introduce the discrete and continuous Fourier transforms and investigate
their use in extracting important information from audio data
- …
