155 research outputs found

    Bootstrap averaging for model-based source separation in reverberant conditions

    Get PDF
    Recently proposed model-based methods use timefrequency (T-F) masking for source separation, where the T-F masks are derived from various cues described by a frequency domain Gaussian Mixture Model (GMM). These methods work well for separating mixtures recorded in low-to-medium level of reverberation, however, their performance degrades as the level of reverberation is increased. We note that the relatively poor performance of these methods under reverberant conditions can be attributed to the high variance of the frequency-dependent GMM parameter estimates. To address this limitation, a novel bootstrap-based approach is proposed to improve the accuracy of expectation maximization (EM) estimates of a frequencydependent GMM based on an a priori chosen initialization scheme. It is shown how the proposed technique allows us to construct time-frequency masks which lead to improved model-based source separation for reverberant speech mixtures. Experiments and analysis are performed on speech mixtures formed using real room-recorded impulse responses

    Improving model-based convolutive blind source separation techniques via bootstrap

    Get PDF
    Blind source separation for underdetermined reverberant mixtures is often achieved by assuming a statistical model for cues of interest where the unknown parameters of the statistical model depend on hidden variables. Here, the expectation-maximization (EM) algorithm is employed to compute maximum-likelihood estimates of the unknown model parameters. A by-product of the EM algorithm is a time-frequency (T-F) mask which allows the estimation of the target source from the given mixture. In this paper, we propose the idea of bootstrap averaging to improve separation quality from mixtures recorded under reverberant conditions. Our experiments on real speech mixture signals show an increase in the signal-to-distortion ratio (SDR) over a stateof- the-art baseline algorithm, to our knowledge, currently, the best performing technique in this class of methods

    Statistics of natural reverberation enable perceptual separation of sound and space

    Get PDF
    In everyday listening, sound reaches our ears directly from a source as well as indirectly via reflections known as reverberation. Reverberation profoundly distorts the sound from a source, yet humans can both identify sound sources and distinguish environments from the resulting sound, via mechanisms that remain unclear. The core computational challenge is that the acoustic signatures of the source and environment are combined in a single signal received by the ear. Here we ask whether our recognition of sound sources and spaces reflects an ability to separate their effects and whether any such separation is enabled by statistical regularities of real-world reverberation. To first determine whether such statistical regularities exist, we measured impulse responses (IRs) of 271 spaces sampled from the distribution encountered by humans during daily life. The sampled spaces were diverse, but their IRs were tightly constrained, exhibiting exponential decay at frequency-dependent rates: Mid frequencies reverberated longest whereas higher and lower frequencies decayed more rapidly, presumably due to absorptive properties of materials and air. To test whether humans leverage these regularities, we manipulated IR decay characteristics in simulated reverberant audio. Listeners could discriminate sound sources and environments from these signals, but their abilities degraded when reverberation characteristics deviated from those of real-world environments. Subjectively, atypical IRs were mistaken for sound sources. The results suggest the brain separates sound into contributions from the source and the environment, constrained by a prior on natural reverberation. This separation process may contribute to robust recognition while providing information about spaces around us

    Auditory Displays and Assistive Technologies: the use of head movements by visually impaired individuals and their implementation in binaural interfaces

    Get PDF
    Visually impaired people rely upon audition for a variety of purposes, among these are the use of sound to identify the position of objects in their surrounding environment. This is limited not just to localising sound emitting objects, but also obstacles and environmental boundaries, thanks to their ability to extract information from reverberation and sound reflections- all of which can contribute to effective and safe navigation, as well as serving a function in certain assistive technologies thanks to the advent of binaural auditory virtual reality. It is known that head movements in the presence of sound elicit changes in the acoustical signals which arrive at each ear, and these changes can improve common auditory localisation problems in headphone-based auditory virtual reality, such as front-to-back reversals. The goal of the work presented here is to investigate whether the visually impaired naturally engage head movement to facilitate auditory perception and to what extent it may be applicable to the design of virtual auditory assistive technology. Three novel experiments are presented; a field study of head movement behaviour during navigation, a questionnaire assessing the self-reported use of head movement in auditory perception by visually impaired individuals (each comparing visually impaired and sighted participants) and an acoustical analysis of inter-aural differences and cross- correlations as a function of head angle and sound source distance. It is found that visually impaired people self-report using head movement for auditory distance perception. This is supported by head movements observed during the field study, whilst the acoustical analysis showed that interaural correlations for sound sources within 5m of the listener were reduced as head angle or distance to sound source were increased, and that interaural differences and correlations in reflected sound were generally lower than that of direct sound. Subsequently, relevant guidelines for designers of assistive auditory virtual reality are proposed

    Accurate Sound Localization in Reverberant Environments Is Mediated by Robust Encoding of Spatial Cues in the Auditory Midbrain

    Get PDF
    In reverberant environments, acoustic reflections interfere with the direct sound arriving at a listener's ears, distorting the spatial cues for sound localization. Yet, human listeners have little difficulty localizing sounds in most settings. Because reverberant energy builds up over time, the source location is represented relatively faithfully during the early portion of a sound, but this representation becomes increasingly degraded later in the stimulus. We show that the directional sensitivity of single neurons in the auditory midbrain of anesthetized cats follows a similar time course, although onset dominance in temporal response patterns results in more robust directional sensitivity than expected, suggesting a simple mechanism for improving directional sensitivity in reverberation. In parallel behavioral experiments, we demonstrate that human lateralization judgments are consistent with predictions from a population rate model decoding the observed midbrain responses, suggesting a subcortical origin for robust sound localization in reverberant environments.National Institutes of Health (U.S.) (Grant R01 DC002258)National Institutes of Health (U.S.) (Grant R01 DC05778-02)core National Institutes of Health (U.S.) (Eaton Peabody Laboratory. (Core) Grant P30 DC005209)National Institutes of Health (U.S.) (Grant T32 DC0003

    Reverberation impairs brainstem temporal representations of voiced vowel sounds: challenging "periodicity-tagged" segregation of competing speech in rooms.

    Get PDF
    The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into "auditory objects." Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions.This work was supported by a grant from the BBSRC to Ian M. Winter. Mark Sayles received a University of Cambridge MB/PhD studentship. Tony Watkins (University of Reading, UK) provided the real-room impulse responses. Portions of the data analysis and manuscript preparation were performed by Mark Sayles during the course of an Action on Hearing Loss funded UK–US Fulbright Commission professional scholarship held in the Auditory Neurophysiology and Modeling Laboratory at Purdue University, USA. Mark Sayles is currently supported by a post-doctoral fellowship from Fonds Wetenschappelijk Onderzoek—Vlaanderen, held in the Laboratory of Auditory Neurophysiology at KU Leuven, Belgium.This paper was originally published in Frontiers in Systems Neuroscience (Sayles M, Stasiak A, Winter IM, Frontiers in Systems Neuroscience 2015, 8, 248, doi:10.3389/fnsys.2014.00248)

    Acoustic Speaker Localization with Strong Reverberation and Adaptive Feature Filtering with a Bayes RFS Framework

    Get PDF
    The thesis investigates the challenges of speaker localization in presence of strong reverberation, multi-speaker tracking, and multi-feature multi-speaker state filtering, using sound recordings from microphones. Novel reverberation-robust speaker localization algorithms are derived from the signal and room acoustics models. A multi-speaker tracking filter and a multi-feature multi-speaker state filter are developed based upon the generalized labeled multi-Bernoulli random finite set framework. Experiments and comparative studies have verified and demonstrated the benefits of the proposed methods

    Blind estimation of room acoustic parameters from speech and music signals

    Get PDF
    The acoustic character of a space is often quantified using objective room acoustic parameters. The measurement of these parameters is difficult in occupied conditions and thus measurements are usually performed when the space is un-occupied. This is despite the knowledge that occupancy can impact significantly on the measured parameter value. Within this thesis new methods are developed by which naturalistic signals such as speech and music can be used to perform acoustic parameter measurement. Adoption of naturalistic signals enables passive measurement during orchestral performances and spoken announcements, thus facilitating easy in-situ measurement. Two methods are described within this work; (1) a method utilising artificial neural networks where a network is taught to recognise acoustic parameters from received, reverberated signals and (2) a method based on the maximum likelihood estimation of the decay curve of the room from which parameters are then calculated. (1) The development of the neural network method focuses on a new pre-processor for use with music signals. The pre-processor utilises a narrow band filter bank with centre frequencies chosen based on the equal temperament scale. The success of a machine learning method is linked to the quality of the training data and therefore realistic acoustic simulation algorithms were used to generate a large database of room impulse responses. Room models were defined with realistic randomly generated geometries and surface properties; these models were then used to predict the room impulse responses. (2) In the second approach, a statistical model of the decay of sound in a room was further developed. This model uses a maximum likelihood (ML) framework to yield a number of decay curve estimates from a received reverberant signal. The success of the method depends on a number of stages developed for the algorithm; (a) a pre-processor to select appropriate decay phases for estimation purposes, (b) a rigorous optimisation algorithm to ensure the correct maximum likelihood estimate is found and (c) a method to yield a single optimum decay curve estimate from which the parameters are calculated. The ANN and ML methods were tested using orchestral music and speech signals. The ANN method tended to perform well when estimating the early decay time (EDT), for speech and music signals the error was within the subjective difference limens. However, accuracy was reduced for the reverberation time (Rt) and other parameters. By contrast the ML method performed well for Rt with results for both speech and music within the difference limens for reasonable (<4s) reverberation time. In addition reasonable accuracy was found for EDT, Clarity (C80), Centre time (Ts) and Deutichkeit (D). The ML method is also capable of producing accurate estimates of the binaural parameters Early Lateral Energy Fraction (LEF) and the late lateral strength (LG). A number of real world measurements were carried out in concert halls where the ML accuracy was shown to be sufficient for most parameters. The ML method has the advantage over the ANN method due to its truly blind nature (the ANN method requires a period of learning and is therefore semi-blind). The ML method uses gaps of silence between notes or utterances, when these silence regions are not present the method does not produce an estimate. Accurate estimation requires a long recording (hours of music or many minutes of speech) to ensure that at least some silent regions are present. This thesis shows that, given a sufficiently long recording, accurate estimates of many acoustic parameters can be obtained directly from speech and music. Further extensions to the ML method detailed in this thesis combine the ML estimated decay curve with cepstral methods which detect the locations of early reflections. This improves the accuracy of many of the parameter estimates.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore