15,743 research outputs found

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Sound Source Separation

    Get PDF
    This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

    Representation of Time-Varying Stimuli by a Network Exhibiting Oscillations on a Faster Time Scale

    Get PDF
    Sensory processing is associated with gamma frequency oscillations (30–80 Hz) in sensory cortices. This raises the question whether gamma oscillations can be directly involved in the representation of time-varying stimuli, including stimuli whose time scale is longer than a gamma cycle. We are interested in the ability of the system to reliably distinguish different stimuli while being robust to stimulus variations such as uniform time-warp. We address this issue with a dynamical model of spiking neurons and study the response to an asymmetric sawtooth input current over a range of shape parameters. These parameters describe how fast the input current rises and falls in time. Our network consists of inhibitory and excitatory populations that are sufficient for generating oscillations in the gamma range. The oscillations period is about one-third of the stimulus duration. Embedded in this network is a subpopulation of excitatory cells that respond to the sawtooth stimulus and a subpopulation of cells that respond to an onset cue. The intrinsic gamma oscillations generate a temporally sparse code for the external stimuli. In this code, an excitatory cell may fire a single spike during a gamma cycle, depending on its tuning properties and on the temporal structure of the specific input; the identity of the stimulus is coded by the list of excitatory cells that fire during each cycle. We quantify the properties of this representation in a series of simulations and show that the sparseness of the code makes it robust to uniform warping of the time scale. We find that resetting of the oscillation phase at stimulus onset is important for a reliable representation of the stimulus and that there is a tradeoff between the resolution of the neural representation of the stimulus and robustness to time-warp. Author Summary Sensory processing of time-varying stimuli, such as speech, is associated with high-frequency oscillatory cortical activity, the functional significance of which is still unknown. One possibility is that the oscillations are part of a stimulus-encoding mechanism. Here, we investigate a computational model of such a mechanism, a spiking neuronal network whose intrinsic oscillations interact with external input (waveforms simulating short speech segments in a single acoustic frequency band) to encode stimuli that extend over a time interval longer than the oscillation's period. The network implements a temporally sparse encoding, whose robustness to time warping and neuronal noise we quantify. To our knowledge, this study is the first to demonstrate that a biophysically plausible model of oscillations occurring in the processing of auditory input may generate a representation of signals that span multiple oscillation cycles.National Science Foundation (DMS-0211505); Burroughs Wellcome Fund; U.S. Air Force Office of Scientific Researc
    • …
    corecore