3,258 research outputs found
Sound Source Separation
This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2
Signal Processing in Large Systems: a New Paradigm
For a long time, detection and parameter estimation methods for signal
processing have relied on asymptotic statistics as the number of
observations of a population grows large comparatively to the population size
, i.e. . Modern technological and societal advances now
demand the study of sometimes extremely large populations and simultaneously
require fast signal processing due to accelerated system dynamics. This results
in not-so-large practical ratios , sometimes even smaller than one. A
disruptive change in classical signal processing methods has therefore been
initiated in the past ten years, mostly spurred by the field of large
dimensional random matrix theory. The early works in random matrix theory for
signal processing applications are however scarce and highly technical. This
tutorial provides an accessible methodological introduction to the modern tools
of random matrix theory and to the signal processing methods derived from them,
with an emphasis on simple illustrative examples
Exact Conditional and Unconditional Cram\`er-Rao Bounds for Near Field Localization
This paper considers the Cram\`er-Rao lower Bound (CRB) for the source
localization problem in the near field. More specifically, we use the exact
expression of the delay parameter for the CRB derivation and show how this
exact CRB can be significantly different from the one given in the literature
and based on an approximate time delay expression (usually considered in the
Fresnel region). This CRB derivation is then generalized by considering the
exact expression of the received power profile (i.e., variable gain case)
which, to our best knowledge, has been ignored in the literature. Finally, we
exploit the CRB expression to introduce the new concept of Near Field
Localization (NFL) region for a target localization performance associated to
the application at hand. We illustrate the usefulness of the proposed CRB
derivation and its developments as well as the NFL region concept through
numerical simulations in different scenarios
A Fast DOA Estimation Algorithm Based on Polarization MUSIC
A fast DOA estimation algorithm developed from MUSIC, which also benefits from the processing of the signals' polarization information, is presented. Besides performance enhancement in precision and resolution, the proposed algorithm can be exerted on various forms of polarization sensitive arrays, without specific requirement on the array's pattern. Depending on the continuity property of the space spectrum, a huge amount of computation incurred in the calculation of 4-D space spectrum is averted. Performance and computation complexity analysis of the proposed algorithm is discussed and the simulation results are presented. Compared with conventional MUSIC, it is indicated that the proposed algorithm has considerable advantage in aspects of precision and resolution, with a low computation complexity proportional to a conventional 2-D MUSIC
Tensor Analysis and Fusion of Multimodal Brain Images
Current high-throughput data acquisition technologies probe dynamical systems
with different imaging modalities, generating massive data sets at different
spatial and temporal resolutions posing challenging problems in multimodal data
fusion. A case in point is the attempt to parse out the brain structures and
networks that underpin human cognitive processes by analysis of different
neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the
multimodal, multi-scale nature of neuroimaging data is well reflected by a
multi-way (tensor) structure where the underlying processes can be summarized
by a relatively small number of components or "atoms". We introduce
Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network
notation in order to analyze these models. These diagrams not only clarify
matrix and tensor EEG and fMRI time/frequency analysis and inverse problems,
but also help understand multimodal fusion via Multiway Partial Least Squares
and Coupled Matrix-Tensor Factorization. We show here, for the first time, that
Granger causal analysis of brain networks is a tensor regression problem, thus
allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI
recordings shows the potential of the methods and suggests their use in other
scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE
Impact of Visual Design Elements and Principles in Human Electroencephalogram Brain Activity Assessed with Spectral Methods and Convolutional Neural Networks
The visual design elements and principles (VDEPs) can trigger behavioural changes and emotions in the viewer, but their effects on brain activity are not clearly understood. In this paper, we explore the relationships between brain activity and colour (cold/warm), light (dark/bright), movement (fast/slow), and balance (symmetrical/asymmetrical) VDEPs. We used the public DEAP dataset with the electroencephalogram signals of 32 participants recorded while watching music videos. The characteristic VDEPs for each second of the videos were manually tagged for by a team of two visual communication experts. Results show that variations in the light/value, rhythm/movement, and balance in the music video sequences produce a statistically significant effect over the mean absolute power of the Delta, Theta, Alpha, Beta, and Gamma EEG bands (p < 0.05). Furthermore, we trained a Convolutional Neural Network that successfully predicts the VDEP of a video fragment solely by the EEG signal of the viewer with an accuracy ranging from 0.7447 for Colour VDEP to 0.9685 for Movement VDEP. Our work shows evidence that VDEPs affect brain activity in a variety of distinguishable ways and that a deep learning classifier can infer visual VDEP properties of the videos from EEG activity
Influence of the listening context on the perceived realism of binaural recordings
Binaural recordings and audio are becoming an interesting resource for composers, live performances and augmented reality. This paper focuses on the acceptance and the perceived quality by the audience of such spatial recordings. We present the results of a preliminary study of psychoacoustic perception where N=26 listeners had to report on the realism and the quality of different couples of sounds taken from two different rooms with peculiar reverb. Sounds are recorded with a self-made dummy head. The stimuli are grouped into classes with respects to some characteristics highlighted as potentially important for the task. Listening condition is fixed with headphones. Participants are divided into musically trained and naive subjects. Results show that there exists differences between the two groups of participants and that the “semantic relevance” of a sound plays a central role
Low is large: spatial location and pitch interact in voice-based body size estimation
The binding of incongruent cues poses a challenge for multimodal perception. Indeed, although taller objects emit sounds from higher elevations, low-pitched sounds are perceptually mapped both to large size and to low elevation. In the present study, we examined how these incongruent vertical spatial cues (up is more) and pitch cues (low is large) to size interact, and whether similar biases influence size perception along the horizontal axis. In Experiment 1, we measured listeners’ voice-based judgments of human body size using pitch-manipulated voices projected from a high versus a low, and a right versus a left, spatial location. Listeners associated low spatial locations with largeness for lowered-pitch but not for raised-pitch voices, demonstrating that pitch overrode vertical-elevation cues. Listeners associated rightward spatial locations with largeness, regardless of voice pitch. In Experiment 2, listeners performed the task while sitting or standing, allowing us to examine self-referential cues to elevation in size estimation. Listeners associated vertically low and rightward spatial cues with largeness more for lowered- than for raised-pitch voices. These correspondences were robust to sex (of both the voice and the listener) and head elevation (standing or sitting); however, horizontal correspondences were amplified when participants stood. Moreover, when participants were standing, their judgments of how much larger men’s voices sounded than women’s increased when the voices were projected from the low speaker. Our results provide novel evidence for a multidimensional spatial mapping of pitch that is generalizable to human voices and that affects performance in an indirect, ecologically relevant spatial task (body size estimation). These findings suggest that crossmodal pitch correspondences evoke both low-level and higher-level cognitive processes
- …