744 research outputs found

    Sound Source Separation

    Get PDF
    This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

    Auditory Streaming: Behavior, Physiology, and Modeling

    Get PDF
    Auditory streaming is a fundamental aspect of auditory perception. It refers to the ability to parse mixed acoustic events into meaningful streams where each stream is assumed to originate from a separate source. Despite wide interest and increasing scientific investigations over the last decade, the neural mechanisms underlying streaming still remain largely unknown. A simple example of this mystery concerns the streaming of simple tone sequences, and the general assumption that separation along the tonotopic axis is sufficient for stream segregation. However, this dissertation research casts doubt on the validity of this assumption. First, behavioral measures of auditory streaming in ferrets prove that they can be used as an animal model to study auditory streaming. Second, responses from neurons in the primary auditory cortex (A1) of ferrets show that spectral components that are well-separated in frequency produce comparably segregated responses along the tonotopic axis, no matter whether presented synchronously or consecutively, despite the substantial differences in their streaming percepts when measured psychoacoustically in humans. These results argue against the notion that tonotopic separation per se is a sufficient neural correlate of stream segregation. Thirdly, comparing responses during behavior to those during the passive condition, the temporal correlations of spiking activity between neurons belonging to the same stream display an increased correlation, while responses among neurons belonging to different streams become less correlated. Rapid task-related plasticity of neural receptive fields shows a pattern that is consistent with the changes in correlation. Taken together these results indicate that temporal coherence is a plausible neural correlate of auditory streaming. Finally, inspired by the above biological findings, we propose a computational model of auditory scene analysis, which uses temporal coherence as the primary criterion for predicting stream formation. The promising results of this dissertation research significantly advance our understanding of auditory streaming and perception

    Source separation with one ear : proposition for an anthropomorphic approach

    Get PDF
    Abstract : We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitudemodulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources fromthe original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised

    Frame Based Single Channel Speech Separation using Sumary Autocorrelation Function

    Get PDF
    Single channel speech separation system is widely used in many applications. Pre-processing stage of Automatic speech recognition system, telecommunication system and the hearing aid design require the speech separation system to enhance the speech. This paper proposes a separation system that separates the dominant speech from the noisy environment, based on summary autocorrelation function (SACF) analysis pitch range estimation in the modulation frequency domain. Performance evaluation of the proposed system shows a better response compared to the existing methods

    Exploiting correlogram structure for robust speech recognition with multiple speech sources

    Get PDF
    This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a `speech fragment decoder' which employs `missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy

    The directional effect of target position on spatial selective auditory attention

    Get PDF
    Spatial selective auditory attention plays a crucial role in listening in a mixture of competing speech sounds. Previous neuroimaging studies have reported alpha band neural activity modulated by auditory attention, along with the alpha lateralization corresponding to attentional focus. A greater cortical representation of the attended speech envelope compared to the ignored speech envelope was also found, a phenomenon known as \u27neural speech tracking’. However, little is known about the neural activities when attentional focus is directed on speech sounds from behind the listener, even though understanding speech from behind is a common and essential aspect of daily life. The objectives of this study are to investigate the impact of four distinct target positions (left, right, front, and particularly, behind) on spatial selective auditory attention by concurrently assessing 1) spatial selective speech identification, 2) oscillatory alpha-band power, and 3) neural speech tracking. Fifteen young adults with normal hearing (NH) were enrolled in this study (M = 21.40, ages 18-29; 10 females). The selective speech identification task indicated that the target position presented at back was the most challenging condition, followed by the front condition, with the lateral condition being the least demanding. The normalized alpha power was modulated by target position and the power was significantly lateralized to either the left or right side, not the front and back. The parieto-occipital alpha power in front-back configuration was significantly lower than the results for left-right listening configuration and the normalized alpha power in the back condition was significantly higher than in the front condition. The speech tracking function of to-be-attended speech envelope was affected by the direction of ix target stream. The behavioral outcome (selective speech identification) was correlated with parieto-occipital alpha power and neural speech tracking correlation coefficient as neural correlates of auditory attention, but there was no significant correlation between alpha power and neural speech tracking. The results suggest that in addition to existing mechanism theories, it might be necessary to consider how our brain responds depending on the location of the sound in order to interpret the neural correlates and behavioral consequences in a meaningful way as well as a potential application of neural speech tracking in studies on spatial selective hearing

    Single channel speech separation with a frame-based pitch range estimation method in modulation frequency

    Get PDF
    Computational Auditory Scene Analysis (CASA) has attracted a lot of interest in segregating speech from monaural mixtures. In this paper, we propose a new method for single channel speech separation with frame-based pitch range estimation in modulation frequency domain. This range is estimated in each frame of modulation spectrum of speech by analyzing onsets and offsets. In the proposed method, target speaker is separated from interfering speaker by filtering the mixture signal with a mask extracted from the modulation spectrogram of mixture signal. Systematic evaluation shows an acceptable level of separation comparing with classic methods

    BLUES from Music: BLind Underdetermined Extraction of Sources from Music

    Get PDF
    In this paper we propose to use an instantaneous ICA method (BLUES) to separate the instruments in a real music stereo recording. We combine two strong separation techniques to segregate instruments from a mixture: ICA and binary time-frequency masking. By combining the methods, we are able to make use of the fact that the sources are differently distributed in both space, time and frequency. Our method is able to segregate an arbitrary number of instruments and the segregated sources are maintained as stereo signals. We have evaluated our method on real stereo recordings, and we can segregate instruments which are spatially different from other instruments

    Source Separation for Hearing Aid Applications

    Get PDF

    Low Complexity Bayesian Single Channel Source Separation

    Get PDF
    • …
    corecore