3,794 research outputs found
FRIDA: FRI-Based DOA Estimation for Arbitrary Array Layouts
In this paper we present FRIDA---an algorithm for estimating directions of
arrival of multiple wideband sound sources. FRIDA combines multi-band
information coherently and achieves state-of-the-art resolution at extremely
low signal-to-noise ratios. It works for arbitrary array layouts, but unlike
the various steered response power and subspace methods, it does not require a
grid search. FRIDA leverages recent advances in sampling signals with a finite
rate of innovation. It is based on the insight that for any array layout, the
entries of the spatial covariance matrix can be linearly transformed into a
uniformly sampled sum of sinusoids.Comment: Submitted to ICASSP201
Application of sound source separation methods to advanced spatial audio systems
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development
and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by
means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel
stereo format, special up-converters are required to use advanced spatial audio reproduction formats,
such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to
accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is
required.
Source separation problems in digital signal processing are those in which several signals have been mixed
together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied
to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately,
most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This
condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to
the sparsity of the sources under some signal transformation.
This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result,
its contributions can be categorized within these two areas. First, two underdetermined SSS methods are
proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a
multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of
sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the
features considered by each of them are related to different localization cues that enable to perform separation
of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at
improving the isolation of the separated sources are proposed. The performance achieved by
several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of
listening tests, paying special attention to the change observed in the perceived spatial attributes.
Although the estimated sources are distorted versions of the original ones, the masking effects
involved in their spatial remixing make artifacts less perceptible, which improves the overall
assessed quality. Finally, some novel developments related to the application of time-frequency
processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci
Sound Source Separation
This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques
[ES] La SeparacĂłn de Fuentes ha sido un tema de intensa investigaciĂłn en muchas aplicaciones de tratamiento de señaal, cubriendo desde el procesado de voz al análisis de im'agenes biomĂ©dicas. Aplicando estas tĂ©cnicas a los sistemas de reproducci'on espacial de audio, se puede solucionar una limitaci Ăłn importante en la resĂntesis de escenas sonoras 3D: la necesidad de disponer de las se ñales individuales correspondientes a cada fuente. El sistema Wave-field Synthesis (WFS) puede sintetizar un campo acĂşstico mediante arrays de altavoces, posicionando varias fuentes en el espacio. Sin embargo, conseguir las señales de cada fuente de forma independiente es normalmente un problema. En este trabajo se propone la utilizaciĂłn de distintas tĂ©cnicas de separaci'on de fuentes sonoras para obtener distintas pistas a partir de grabaciones mono o estĂ©reo. Varios mĂ©todos de separaciĂłn han sido implementados y comprobados, siendo uno de ellos desarrollado por el autor. Aunque los algoritmos existentes están lejos de conseguir una alta calidad, se han realizado tests subjetivos que demuestran cĂłmo no es necesario obtener una separaciĂłn Ăłptima para conseguir resultados aceptables en la reproducciĂłn de escenas 3D[EN] Source Separation has been a subject of intense research in many signal processing applications, ranging
from speech processing to medical image analysis. Applied to spatial audio systems, it can be used to
overcome one fundamental limitation in 3D scene resynthesis: the need of having the independent
signals for each source available. Wave-field Synthesis is a spatial sound reproduction system that can
synthesize an acoustic field by means of loudspeaker arrays and it is also capable of positioning several
sources in space. However, the individual signals corresponding to these sources must be available and
this is often a difficult problem. In this work, we propose to use Sound Source Separation techniques
in order to obtain different tracks from stereo and mono mixtures. Some separation methods have
been implemented and tested, having been one of them developed by the author. Although existing
algorithms are far from getting hi-fi quality, subjective tests show how it is not necessary an optimum
separation for getting acceptable results in 3D scene reproductionCobos Serrano, M. (2007). Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques. http://hdl.handle.net/10251/12515Archivo delegad
Space Time MUSIC: Consistent Signal Subspace Estimation for Wide-band Sensor Arrays
Wide-band Direction of Arrival (DOA) estimation with sensor arrays is an
essential task in sonar, radar, acoustics, biomedical and multimedia
applications. Many state of the art wide-band DOA estimators coherently process
frequency binned array outputs by approximate Maximum Likelihood, Weighted
Subspace Fitting or focusing techniques. This paper shows that bin signals
obtained by filter-bank approaches do not obey the finite rank narrow-band
array model, because spectral leakage and the change of the array response with
frequency within the bin create \emph{ghost sources} dependent on the
particular realization of the source process. Therefore, existing DOA
estimators based on binning cannot claim consistency even with the perfect
knowledge of the array response. In this work, a more realistic array model
with a finite length of the sensor impulse responses is assumed, which still
has finite rank under a space-time formulation. It is shown that signal
subspaces at arbitrary frequencies can be consistently recovered under mild
conditions by applying MUSIC-type (ST-MUSIC) estimators to the dominant
eigenvectors of the wide-band space-time sensor cross-correlation matrix. A
novel Maximum Likelihood based ST-MUSIC subspace estimate is developed in order
to recover consistency. The number of sources active at each frequency are
estimated by Information Theoretic Criteria. The sample ST-MUSIC subspaces can
be fed to any subspace fitting DOA estimator at single or multiple frequencies.
Simulations confirm that the new technique clearly outperforms binning
approaches at sufficiently high signal to noise ratio, when model mismatches
exceed the noise floor.Comment: 15 pages, 10 figures. Accepted in a revised form by the IEEE Trans.
on Signal Processing on 12 February 1918. @IEEE201
Database of audio records
Diplomka a prakticky castDiplome with partical part
- …