3,258 research outputs found

    Sound Source Separation

    Get PDF
    This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

    Signal Processing in Large Systems: a New Paradigm

    Full text link
    For a long time, detection and parameter estimation methods for signal processing have relied on asymptotic statistics as the number nn of observations of a population grows large comparatively to the population size NN, i.e. n/Nn/N\to \infty. Modern technological and societal advances now demand the study of sometimes extremely large populations and simultaneously require fast signal processing due to accelerated system dynamics. This results in not-so-large practical ratios n/Nn/N, sometimes even smaller than one. A disruptive change in classical signal processing methods has therefore been initiated in the past ten years, mostly spurred by the field of large dimensional random matrix theory. The early works in random matrix theory for signal processing applications are however scarce and highly technical. This tutorial provides an accessible methodological introduction to the modern tools of random matrix theory and to the signal processing methods derived from them, with an emphasis on simple illustrative examples

    Exact Conditional and Unconditional Cram\`er-Rao Bounds for Near Field Localization

    Full text link
    This paper considers the Cram\`er-Rao lower Bound (CRB) for the source localization problem in the near field. More specifically, we use the exact expression of the delay parameter for the CRB derivation and show how this exact CRB can be significantly different from the one given in the literature and based on an approximate time delay expression (usually considered in the Fresnel region). This CRB derivation is then generalized by considering the exact expression of the received power profile (i.e., variable gain case) which, to our best knowledge, has been ignored in the literature. Finally, we exploit the CRB expression to introduce the new concept of Near Field Localization (NFL) region for a target localization performance associated to the application at hand. We illustrate the usefulness of the proposed CRB derivation and its developments as well as the NFL region concept through numerical simulations in different scenarios

    A Fast DOA Estimation Algorithm Based on Polarization MUSIC

    Get PDF
    A fast DOA estimation algorithm developed from MUSIC, which also benefits from the processing of the signals' polarization information, is presented. Besides performance enhancement in precision and resolution, the proposed algorithm can be exerted on various forms of polarization sensitive arrays, without specific requirement on the array's pattern. Depending on the continuity property of the space spectrum, a huge amount of computation incurred in the calculation of 4-D space spectrum is averted. Performance and computation complexity analysis of the proposed algorithm is discussed and the simulation results are presented. Compared with conventional MUSIC, it is indicated that the proposed algorithm has considerable advantage in aspects of precision and resolution, with a low computation complexity proportional to a conventional 2-D MUSIC

    Tensor Analysis and Fusion of Multimodal Brain Images

    Get PDF
    Current high-throughput data acquisition technologies probe dynamical systems with different imaging modalities, generating massive data sets at different spatial and temporal resolutions posing challenging problems in multimodal data fusion. A case in point is the attempt to parse out the brain structures and networks that underpin human cognitive processes by analysis of different neuroimaging modalities (functional MRI, EEG, NIRS etc.). We emphasize that the multimodal, multi-scale nature of neuroimaging data is well reflected by a multi-way (tensor) structure where the underlying processes can be summarized by a relatively small number of components or "atoms". We introduce Markov-Penrose diagrams - an integration of Bayesian DAG and tensor network notation in order to analyze these models. These diagrams not only clarify matrix and tensor EEG and fMRI time/frequency analysis and inverse problems, but also help understand multimodal fusion via Multiway Partial Least Squares and Coupled Matrix-Tensor Factorization. We show here, for the first time, that Granger causal analysis of brain networks is a tensor regression problem, thus allowing the atomic decomposition of brain networks. Analysis of EEG and fMRI recordings shows the potential of the methods and suggests their use in other scientific domains.Comment: 23 pages, 15 figures, submitted to Proceedings of the IEE

    Impact of Visual Design Elements and Principles in Human Electroencephalogram Brain Activity Assessed with Spectral Methods and Convolutional Neural Networks

    Get PDF
    The visual design elements and principles (VDEPs) can trigger behavioural changes and emotions in the viewer, but their effects on brain activity are not clearly understood. In this paper, we explore the relationships between brain activity and colour (cold/warm), light (dark/bright), movement (fast/slow), and balance (symmetrical/asymmetrical) VDEPs. We used the public DEAP dataset with the electroencephalogram signals of 32 participants recorded while watching music videos. The characteristic VDEPs for each second of the videos were manually tagged for by a team of two visual communication experts. Results show that variations in the light/value, rhythm/movement, and balance in the music video sequences produce a statistically significant effect over the mean absolute power of the Delta, Theta, Alpha, Beta, and Gamma EEG bands (p < 0.05). Furthermore, we trained a Convolutional Neural Network that successfully predicts the VDEP of a video fragment solely by the EEG signal of the viewer with an accuracy ranging from 0.7447 for Colour VDEP to 0.9685 for Movement VDEP. Our work shows evidence that VDEPs affect brain activity in a variety of distinguishable ways and that a deep learning classifier can infer visual VDEP properties of the videos from EEG activity

    Influence of the listening context on the perceived realism of binaural recordings

    Get PDF
    Binaural recordings and audio are becoming an interesting resource for composers, live performances and augmented reality. This paper focuses on the acceptance and the perceived quality by the audience of such spatial recordings. We present the results of a preliminary study of psychoacoustic perception where N=26 listeners had to report on the realism and the quality of different couples of sounds taken from two different rooms with peculiar reverb. Sounds are recorded with a self-made dummy head. The stimuli are grouped into classes with respects to some characteristics highlighted as potentially important for the task. Listening condition is fixed with headphones. Participants are divided into musically trained and naive subjects. Results show that there exists differences between the two groups of participants and that the “semantic relevance” of a sound plays a central role

    Low is large: spatial location and pitch interact in voice-based body size estimation

    Get PDF
    The binding of incongruent cues poses a challenge for multimodal perception. Indeed, although taller objects emit sounds from higher elevations, low-pitched sounds are perceptually mapped both to large size and to low elevation. In the present study, we examined how these incongruent vertical spatial cues (up is more) and pitch cues (low is large) to size interact, and whether similar biases influence size perception along the horizontal axis. In Experiment 1, we measured listeners’ voice-based judgments of human body size using pitch-manipulated voices projected from a high versus a low, and a right versus a left, spatial location. Listeners associated low spatial locations with largeness for lowered-pitch but not for raised-pitch voices, demonstrating that pitch overrode vertical-elevation cues. Listeners associated rightward spatial locations with largeness, regardless of voice pitch. In Experiment 2, listeners performed the task while sitting or standing, allowing us to examine self-referential cues to elevation in size estimation. Listeners associated vertically low and rightward spatial cues with largeness more for lowered- than for raised-pitch voices. These correspondences were robust to sex (of both the voice and the listener) and head elevation (standing or sitting); however, horizontal correspondences were amplified when participants stood. Moreover, when participants were standing, their judgments of how much larger men’s voices sounded than women’s increased when the voices were projected from the low speaker. Our results provide novel evidence for a multidimensional spatial mapping of pitch that is generalizable to human voices and that affects performance in an indirect, ecologically relevant spatial task (body size estimation). These findings suggest that crossmodal pitch correspondences evoke both low-level and higher-level cognitive processes
    corecore