32,111 research outputs found
Coding overcomplete representations of audio using the MCLT
We propose a system for audio coding using the modulated complex
lapped transform (MCLT). In general, it is difficult to encode signals using
overcomplete representations without avoiding a penalty in rate-distortion
performance. We show that the penalty can be significantly reduced for
MCLT-based representations, without the need for iterative methods of
sparsity reduction. We achieve that via a magnitude-phase polar quantization
and the use of magnitude and phase prediction. Compared to systems based
on quantization of orthogonal representations such as the modulated lapped
transform (MLT), the new system allows for reduced warbling artifacts and
more precise computation of frequency-domain auditory masking functions
Extraction of vocal-tract system characteristics from speechsignals
We propose methods to track natural variations in the characteristics of the vocal-tract system from speech signals. We are especially interested in the cases where these characteristics vary over time, as happens in dynamic sounds such as consonant-vowel transitions. We show that the selection of appropriate analysis segments is crucial in these methods, and we propose a selection based on estimated instants of significant excitation. These instants are obtained by a method based on the average group-delay property of minimum-phase signals. In voiced speech, they correspond to the instants of glottal closure. The vocal-tract system is characterized by its formant parameters, which are extracted from the analysis segments. Because the segments are always at the same relative position in each pitch period, in voiced speech the extracted formants are consistent across successive pitch periods. We demonstrate the results of the analysis for several difficult cases of speech signals
Sound Source Separation
This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Improving subband spectral estimation using modified AR model
It has already been shown that spectral estimation can be improved when applied to subband outputs of an adapted filterbank rather than to the original fullband signal. In the present paper, this procedure is applied jointly to a novel predictive autoregressive (AR) model. The model exploits time-shifting and is therefore referred to as time-shift AR (TSAR)
model. Estimators are proposed for the unknown TS-AR parameters and the spectrum of the observed signal. The TS-AR model yields improved spectrum estimation by taking advantage of the correlation between subseries that after decimation. Simulation results on signals with continuous and line spectra that demonstrate the performance of the proposed method are provided
- âŠ