457 research outputs found

    Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds

    Get PDF
    In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table

    Efficient coding of spectrotemporal binaural sounds leads to emergence of the auditory space representation

    Full text link
    To date a number of studies have shown that receptive field shapes of early sensory neurons can be reproduced by optimizing coding efficiency of natural stimulus ensembles. A still unresolved question is whether the efficient coding hypothesis explains formation of neurons which explicitly represent environmental features of different functional importance. This paper proposes that the spatial selectivity of higher auditory neurons emerges as a direct consequence of learning efficient codes for natural binaural sounds. Firstly, it is demonstrated that a linear efficient coding transform - Independent Component Analysis (ICA) trained on spectrograms of naturalistic simulated binaural sounds extracts spatial information present in the signal. A simple hierarchical ICA extension allowing for decoding of sound position is proposed. Furthermore, it is shown that units revealing spatial selectivity can be learned from a binaural recording of a natural auditory scene. In both cases a relatively small subpopulation of learned spectrogram features suffices to perform accurate sound localization. Representation of the auditory space is therefore learned in a purely unsupervised way by maximizing the coding efficiency and without any task-specific constraints. This results imply that efficient coding is a useful strategy for learning structures which allow for making behaviorally vital inferences about the environment.Comment: 22 pages, 9 figure

    Enhanced IVA for audio separation in highly reverberant environments

    Get PDF
    Blind Audio Source Separation (BASS), inspired by the "cocktail-party problem", has been a leading research application for blind source separation (BSS). This thesis concerns the enhancement of frequency domain convolutive blind source separation (FDCBSS) techniques for audio separation in highly reverberant room environments. Independent component analysis (ICA) is a higher order statistics (HOS) approach commonly used in the BSS framework. When applied to audio FDCBSS, ICA based methods suffer from the permutation problem across the frequency bins of each source. Independent vector analysis (IVA) is an FD-BSS algorithm that theoretically solves the permutation problem by using a multivariate source prior, where the sources are considered to be random vectors. The algorithm allows independence between multivariate source signals, and retains dependency between the source signals within each source vector. The source prior adopted to model the nonlinear dependency structure within the source vectors is crucial to the separation performance of the IVA algorithm. The focus of this thesis is on improving the separation performance of the IVA algorithm in the application of BASS. An alternative multivariate Student's t distribution is proposed as the source prior for the batch IVA algorithm. A Student's t probability density function can better model certain frequency domain speech signals due to its tail dependency property. Then, the nonlinear score function, for the IVA, is derived from the proposed source prior. A novel energy driven mixed super Gaussian and Student's t source prior is proposed for the IVA and FastIVA algorithms. The Student's t distribution, in the mixed source prior, can model the high amplitude data points whereas the super Gaussian distribution can model the lower amplitude information in the speech signals. The ratio of both distributions can be adjusted according to the energy of the observed mixtures to adapt for different types of speech signals. A particular multivariate generalized Gaussian distribution is adopted as the source prior for the online IVA algorithm. The nonlinear score function derived from this proposed source prior contains fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure and thereby improves the separation performance. An adaptive learning scheme is developed to improve the performance of the online IVA algorithm. The scheme adjusts the learning rate as a function of proximity to the target solutions. The scheme is also accompanied with a novel switched source prior technique taking the best performance properties of the super Gaussian source prior and the generalized Gaussian source prior as the algorithm converges. The methods and techniques, proposed in this thesis, are evaluated with real speech source signals in different simulated and real reverberant acoustic environments. A variety of measures are used within the evaluation criteria of the various algorithms. The experimental results demonstrate improved performance of the proposed methods and their robustness in a wide range of situations

    Adjustment of interaural-time-difference analysis to sound level

    Get PDF
    To localize low-frequency sound sources in azimuth, the binaural system compares the timing of sound waves at the two ears with microsecond precision. A similarly high precision is also seen in the binaural processing of the envelopes of high-frequency complex sounds. Both for low- and high-frequency sounds, interaural time difference (ITD) acuity is to a large extent independent of sound level. The mechanisms underlying this level-invariant extraction of ITDs by the binaural system are, however, only poorly understood. We use high-frequency pip trains with asymmetric and dichotic pip envelopes in a combined psychophysical, electrophysiological, and modeling approach. Although the dichotic envelopes cannot be physically matched in terms of ITD, the match produced perceptually by humans is very reliable, and it depends systematically on the overall sound level. These data are reflected in neural responses from the gerbil lateral superior olive and lateral lemniscus. The results are predicted in an existing temporal-integration model extended with a level-dependent threshold criterion. These data provide a very sensitive quantification of how the peripheral temporal code is conditioned for binaural analysis

    Time-Frequency Masking Performance for Improved Intelligibility with Microphone Arrays

    Get PDF
    Time-Frequency (TF) masking is an audio processing technique useful for isolating an audio source from interfering sources. TF masking has been applied and studied in monaural and binaural applications, but has only recently been applied to distributed microphone arrays. This work focuses on evaluating the TF masking technique\u27s ability to isolate human speech and improve speech intelligibility in an immersive cocktail party environment. In particular, an upper-bound on TF masking performance is established and compared to the traditional delay-sum and general sidelobe canceler (GSC) beamformers. Additionally, the novel technique of combining the GSC with TF masking is investigated and its performance evaluated. This work presents a resource-efficient method for studying the performance of these isolation techniques and evaluates their performance using both virtually simulated data and data recorded in a real-life acoustical environment. Further, methods are presented to analyze speech intelligibility post-processing, and automated objective intelligibility measurements are applied alongside informal subjective assessments to evaluate the performance of these processing techniques. Finally, the causes for subjective/objective intelligibility measurement disagreements are discussed, and it was shown that TF masking did enhance intelligibility beyond delay-sum beamforming and that the utilization of adaptive beamforming can be beneficial

    Speech Perception in Reverberated Condition By Cochlear Implants

    Get PDF
    Previous Studies for bilateral cochlear implants users examined cocktail -party setting under anechoic listening conditions. However in real world listeners always encounter problems of reverberation, which could significantly deteriorate speech intelligibility for all listeners, independent of their hearing status. The object of this study is to investigate the effects of reverberation on the binaural benefits for speech recognition by bilateral cochlear-implant (CI) listeners. Bilateral CI subject was tested under different reverberation conditions. IEEE recorded sentences from one male speaker mixed with either speech shaped noise (ssn), energy masking, or with 2 female competing takers (2fsn), informational masking, at different signal -noise -ratios (SSN) were used as stimuli. The male target speech was always set at 90 degrees; azimuth (from the front), while the masker were placed 0 degrees;, 90 degrees;, 180 degrees; azimuth (0 degrees; implied left, 180 degrees; implied right). Generated stimuli were presented to Bilateral Cochlear Implant subjects via auxiliary input, which was connected to sound processor in a double wall sound attenuated booth. In each condition, subject was tested with individual ear alone, as well as with both ears. Prior studies predict there would be decrease in speech intelligibility in reverberated condition as compared with anechoic environment. As predicted we saw a decrease in speech intelligibility in reverberated condition as compared with anechoic environment as reverberant environment produce more masking than the less reverberant environment do. We also observed that benefit of spatial hearing in reverberant environment. We observed that when the masking was placed at the better ear the subject performed better than the masking placed the other ear. We also observed the reverberation effect on energetic and informational masking. We observed that when the target and interfere are spatially separated, reverberation had greater detrimental effect on informational masking than energetic masking, and when the target and interfere were co-located the energetic masking results performed better than informational masking. Due to time limitation and subject availability, test was done with one CI subject. Further testing and research on this topic, would help to understand the effect/s the informational masking vs energetic masking in reverberated conditions

    Communications Biophysics

    Get PDF
    Contains research objectives and reports on eight research projects split into three sections.National Institutes of Health (Grant 2 PO1 NS13126)National Institutes of Health (Grant 5 RO1 NS18682)National Institutes of Health (Grant 5 RO1 NS20322)National Institutes of Health (Grant 1 RO1 NS 20269)National Institutes of Health (Grant 5 T32 NS 07047)Symbion, Inc.National Institutes of Health (Grant 5 R01 NS10916)National Institutes of Health (Grant 1 RO NS 16917)National Science Foundation (Grant BNS83-19874)National Science Foundation (Grant BNS83-19887)National Institutes of Health (Grant 5 RO1 NS12846)National Institutes of Health (Grant 1 RO1 NS21322-01)National Institutes of Health (Grant 5 T32-NS07099-07)National Institutes of Health (Grant 1 RO1 NS14092-06)National Science Foundation (Grant BNS77-21751)National Institutes of Health (Grant 5 RO1 NS11080

    Model-based speech enhancement for hearing aids

    Get PDF
    • …
    corecore