223 research outputs found

    Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping

    Full text link

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Ambisonics

    Get PDF
    This open access book provides a concise explanation of the fundamentals and background of the surround sound recording and playback technology Ambisonics. It equips readers with the psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. The book’s introductory section offers a perspective on Ambisonics spanning from the origins of coincident recordings in the 1930s to the Ambisonic concepts of the 1970s, as well as classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced since the 1980s. As, from time to time, the underlying mathematics become quite involved, but should be comprehensive without sacrificing readability, the book includes an extensive mathematical appendix. The book offers readers a deeper understanding of Ambisonic technologies, and will especially benefit scientists, audio-system and audio-recording engineers. In the advanced sections of the book, fundamentals and modern techniques as higher-order Ambisonic decoding, 3D audio effects, and higher-order recording are explained. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material

    Sound Event Localization, Detection, and Tracking by Deep Neural Networks

    Get PDF
    In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

    Binaural scene analysis : localization, detection and recognition of speakers in complex acoustic scenes

    Get PDF
    The human auditory system has the striking ability to robustly localize and recognize a specific target source in complex acoustic environments while ignoring interfering sources. Surprisingly, this remarkable capability, which is referred to as auditory scene analysis, is achieved by only analyzing the waveforms reaching the two ears. Computers, however, are presently not able to compete with the performance achieved by the human auditory system, even in the restricted paradigm of confronting a computer algorithm based on binaural signals with a highly constrained version of auditory scene analysis, such as localizing a sound source in a reverberant environment or recognizing a speaker in the presence of interfering noise. In particular, the problem of focusing on an individual speech source in the presence of competing speakers, termed the cocktail party problem, has been proven to be extremely challenging for computer algorithms. The primary objective of this thesis is the development of a binaural scene analyzer that is able to jointly localize, detect and recognize multiple speech sources in the presence of reverberation and interfering noise. The processing of the proposed system is divided into three main stages: localization stage, detection of speech sources, and recognition of speaker identities. The only information that is assumed to be known a priori is the number of target speech sources that are present in the acoustic mixture. Furthermore, the aim of this work is to reduce the performance gap between humans and machines by improving the performance of the individual building blocks of the binaural scene analyzer. First, a binaural front-end inspired by auditory processing is designed to robustly determine the azimuth of multiple, simultaneously active sound sources in the presence of reverberation. The localization model builds on the supervised learning of azimuthdependent binaural cues, namely interaural time and level differences. Multi-conditional training is performed to incorporate the uncertainty of these binaural cues resulting from reverberation and the presence of competing sound sources. Second, a speech detection module that exploits the distinct spectral characteristics of speech and noise signals is developed to automatically select azimuthal positions that are likely to correspond to speech sources. Due to the established link between the localization stage and the recognition stage, which is realized by the speech detection module, the proposed binaural scene analyzer is able to selectively focus on a predefined number of speech sources that are positioned at unknown spatial locations, while ignoring interfering noise sources emerging from other spatial directions. Third, the speaker identities of all detected speech sources are recognized in the final stage of the model. To reduce the impact of environmental noise on the speaker recognition performance, a missing data classifier is combined with the adaptation of speaker models using a universal background model. This combination is particularly beneficial in nonstationary background noise

    Estimating uncertainty models for speech source localization in real-world environments

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 131-140).This thesis develops improved solutions to the problems of audio source localization and speech source separation in real reverberant environments. For source localization, it develops a new time- and frequency-dependent weighting function for the generalized cross-correlation framework for time delay estimation. This weighting function is derived from the speech spectrogram as the result of a transformation designed to optimally predict localization cue accuracy. By structuring the problem in this way, we take advantage of the nonstationarity of speech in a way that is similar to the psychoacoustics of the precedence effect. For source separation, we use the same weighting function as part of a simple probabilistic generative model of localization cues. We combine this localization cue model with a mixture model of speech log-spectra and use this combined model to do speech source separation. For both source localization and source separation, we show significantly performance improvements over existing techniques on both real and simulated data in a range of acoustic environments.by Kevin William Wilson.Ph.D

    Measurement-Based Automatic Parameterization of a Virtual Acoustic Room Model

    Get PDF
    Modernien auralisaatiotekniikoiden ansiosta kuulokkeilla voidaan tuottaa kuuntelukokemus, joka muistuttaa useimpien äänitteiden tuotannossa oletettua kaiutinkuuntelua. Huoneakustinen mallinnus on tärkeä osa toimivaa auralisaatiojärjestelmää. Huonemallinnuksen parametrien määrittäminen vaatii kuitenkin ammattitaitoa ja aikaa. Tässä työssä kehitetään järjestelmä parametrien automaattiseksi määrittämiseksi huoneakustisten mittausten perusteella. Parametrisaatio perustuu mikrofoniryhmällä mitattuihin huoneen impulssivasteisiin ja voidaan jakaa kahteen osaan: suoran äänen ja aikaisten heijastusten analyysiin sekä jälkikaiunnan analyysiin. Suorat äänet erotellaan impulssivasteista erilaisia signaalinkäsittelytekniikoita käyttäen ja niitä hyödynnetään heijastuksia etsivässä algoritmissa. Äänilähteet ja heijastuksia vastaavat kuvalähteet paikannetaan saapumisaikaeroon perustuvalla paikannusmenetelmällä ja taajuusriippuvat etenemistien vaikutukset arvioidaan kuvalähdemallissa käyttöä varten. Auralisaation jälkikaiunta on toteutettu takaisinkytkevällä viiveverkostomallilla. Sen parametrisointi vaatii taajuusriippuvan jälkikaiunta-ajan ja jälkikaiunnan taajuusvasteen määrittämistä. Normalisoitua kaikutiheyttä käytetään jälkikaiunnan alkamisajan löytämiseen mittauksista ja simuloidun jälkikaiunnan alkamisajan asettamiseen. Jälkikaiunta-aikojen määrittämisessä hyödynnetään energy decay relief -metodia. Kuuntelukokeiden perusteella automaattinen parametrisaatiojärjestelmä tuottaa parempia tuloksia kuin parametrien asettaminen manuaalisesti huoneen summittaisten geometriatietojen pohjalta. Järjestelmässä on ongelmia erityisesti jälkikaiunnan ekvalisoinnissa, mutta käytettyihin suhteellisen yksinkertaisiin tekniikoihin nähden järjestelmä toimii hyvin.Modern auralization techniques enable making the headphone listening experience similar to the experience of listening with loudspeakers, which is the reproduction method most content is made to be listened with. Room acoustic modeling is an essential part of a plausible auralization system. Specifying the parameters for room modeling requires expertise and time. In this thesis, a system is developed for automatic analysis of the parameters from room acoustic measurements. The parameterization is based on room impulse responses measured with a microphone array and can be divided into two parts: the analysis of the direct sound and early reflections, and the analysis of the late reverberation. The direct sounds are separated from the impulse responses using various signal processing techniques and used in the matching pursuit algorithm to find the reflections in the impulse responses. The sound sources and their reflection images are localized using time difference of arrival -based localization and frequency-dependent propagation path effects are estimated for use in an image source model. The late reverberation of the auralization is implemented using a feedback delay network. Its parameterization requires the analysis of the frequency-dependent reverberation time and frequency response of the late reverberation. Normalized echo density is used to determine the beginning of the late reverberation in the measurements and to set the starting point of the modeled late field. The reverberation times are analyzed using the energy decay relief. A formal listening test shows that the automatic parameterization system outperforms parameters set manually based on approximate geometrical data. Problems remain especially in the precision of the late reverberation equalization but the system works well considering the relative simplicity of the processing methods used

    Localization of sound sources : a systematic review

    Get PDF
    Sound localization is a vast field of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this field to gain benefits. Various types of microphone arrays serve the purpose of sensing the incoming sound. This paper presents an overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones. In order to overcome these limitations certain approaches are also presented. Detailed explanation of some of the existing methods that are used for sound localization using microphone arrays in the recent literature is given. Existing methods are studied in a comparative fashion along with the factors that influence the choice of one method over the others. This review is done in order to form a basis for choosing the best fit method for our use

    Measurement-Based Automatic Parameterization of a Virtual Acoustic Room Model

    Get PDF
    Modernien auralisaatiotekniikoiden ansiosta kuulokkeilla voidaan tuottaa kuuntelukokemus, joka muistuttaa useimpien äänitteiden tuotannossa oletettua kaiutinkuuntelua. Huoneakustinen mallinnus on tärkeä osa toimivaa auralisaatiojärjestelmää. Huonemallinnuksen parametrien määrittäminen vaatii kuitenkin ammattitaitoa ja aikaa. Tässä työssä kehitetään järjestelmä parametrien automaattiseksi määrittämiseksi huoneakustisten mittausten perusteella. Parametrisaatio perustuu mikrofoniryhmällä mitattuihin huoneen impulssivasteisiin ja voidaan jakaa kahteen osaan: suoran äänen ja aikaisten heijastusten analyysiin sekä jälkikaiunnan analyysiin. Suorat äänet erotellaan impulssivasteista erilaisia signaalinkäsittelytekniikoita käyttäen ja niitä hyödynnetään heijastuksia etsivässä algoritmissa. Äänilähteet ja heijastuksia vastaavat kuvalähteet paikannetaan saapumisaikaeroon perustuvalla paikannusmenetelmällä ja taajuusriippuvat etenemistien vaikutukset arvioidaan kuvalähdemallissa käyttöä varten. Auralisaation jälkikaiunta on toteutettu takaisinkytkevällä viiveverkostomallilla. Sen parametrisointi vaatii taajuusriippuvan jälkikaiunta-ajan ja jälkikaiunnan taajuusvasteen määrittämistä. Normalisoitua kaikutiheyttä käytetään jälkikaiunnan alkamisajan löytämiseen mittauksista ja simuloidun jälkikaiunnan alkamisajan asettamiseen. Jälkikaiunta-aikojen määrittämisessä hyödynnetään energy decay relief -metodia. Kuuntelukokeiden perusteella automaattinen parametrisaatiojärjestelmä tuottaa parempia tuloksia kuin parametrien asettaminen manuaalisesti huoneen summittaisten geometriatietojen pohjalta. Järjestelmässä on ongelmia erityisesti jälkikaiunnan ekvalisoinnissa, mutta käytettyihin suhteellisen yksinkertaisiin tekniikoihin nähden järjestelmä toimii hyvin.Modern auralization techniques enable making the headphone listening experience similar to the experience of listening with loudspeakers, which is the reproduction method most content is made to be listened with. Room acoustic modeling is an essential part of a plausible auralization system. Specifying the parameters for room modeling requires expertise and time. In this thesis, a system is developed for automatic analysis of the parameters from room acoustic measurements. The parameterization is based on room impulse responses measured with a microphone array and can be divided into two parts: the analysis of the direct sound and early reflections, and the analysis of the late reverberation. The direct sounds are separated from the impulse responses using various signal processing techniques and used in the matching pursuit algorithm to find the reflections in the impulse responses. The sound sources and their reflection images are localized using time difference of arrival -based localization and frequency-dependent propagation path effects are estimated for use in an image source model. The late reverberation of the auralization is implemented using a feedback delay network. Its parameterization requires the analysis of the frequency-dependent reverberation time and frequency response of the late reverberation. Normalized echo density is used to determine the beginning of the late reverberation in the measurements and to set the starting point of the modeled late field. The reverberation times are analyzed using the energy decay relief. A formal listening test shows that the automatic parameterization system outperforms parameters set manually based on approximate geometrical data. Problems remain especially in the precision of the late reverberation equalization but the system works well considering the relative simplicity of the processing methods used
    corecore