184 research outputs found

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    OBJECTIVE AND SUBJECTIVE EVALUATION OF DEREVERBERATION ALGORITHMS

    Get PDF
    Reverberation significantly impacts the quality and intelligibility of speech. Several dereverberation algorithms have been proposed in the literature to combat this problem. A majority of these algorithms utilize a single channel and are developed for monaural applications, and as such do not preserve the cues necessary for sound localization. This thesis describes a blind two-channel dereverberation technique that improves the quality of speech corrupted by reverberation while preserving cues that affect localization. The method is based by combining a short term (2ms) and long term (20ms) weighting function of the linear prediction (LP) residual of the input signal. The developed and other dereverberation algorithms are evaluated objectively and subjectively in terms of sound quality and localization accuracy. The binaural adaptation provides a significant increase in sound quality while removing the loss in localization ability found in the bilateral implementation

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Multiple source localization using spherical microphone arrays

    Get PDF
    Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimation methods for the real-world scenarios address one or more performance degrading factors such as noise, reverberation, multi-source activity or tackled problems such as source counting or reducing computational complexity. This thesis addresses various problems in MS DOA estimation for speech sources each of which focuses on one or more performance degrading factor(s). Firstly a narrowband DOA estimator is proposed utilizing high order spatial information in two computationally efficient ways. Secondly, an autonomous source counting technique is proposed which uses density-based clustering in an evolutionary framework. Thirdly, a confidence metric for validity of Single Source (SS) assumption in a Time-Frequency (TF) bin is proposed. It is based on MS assumption in a short time interval where the number and the TF bin of active sources are adaptively estimated. Finally two analytical narrowband MS DOA estimators are proposed based on MS assumption in a TF bin. The proposed methods are evaluated using simulations and real recordings. Each proposed technique outperforms comparative baseline methods and performs at least as accurately as the state-of-the-art.Open Acces

    AcousticFusion: Fusing Sound Source Localization to Visual SLAM in Dynamic Environments

    Get PDF

    Binaural scene analysis : localization, detection and recognition of speakers in complex acoustic scenes

    Get PDF
    The human auditory system has the striking ability to robustly localize and recognize a specific target source in complex acoustic environments while ignoring interfering sources. Surprisingly, this remarkable capability, which is referred to as auditory scene analysis, is achieved by only analyzing the waveforms reaching the two ears. Computers, however, are presently not able to compete with the performance achieved by the human auditory system, even in the restricted paradigm of confronting a computer algorithm based on binaural signals with a highly constrained version of auditory scene analysis, such as localizing a sound source in a reverberant environment or recognizing a speaker in the presence of interfering noise. In particular, the problem of focusing on an individual speech source in the presence of competing speakers, termed the cocktail party problem, has been proven to be extremely challenging for computer algorithms. The primary objective of this thesis is the development of a binaural scene analyzer that is able to jointly localize, detect and recognize multiple speech sources in the presence of reverberation and interfering noise. The processing of the proposed system is divided into three main stages: localization stage, detection of speech sources, and recognition of speaker identities. The only information that is assumed to be known a priori is the number of target speech sources that are present in the acoustic mixture. Furthermore, the aim of this work is to reduce the performance gap between humans and machines by improving the performance of the individual building blocks of the binaural scene analyzer. First, a binaural front-end inspired by auditory processing is designed to robustly determine the azimuth of multiple, simultaneously active sound sources in the presence of reverberation. The localization model builds on the supervised learning of azimuthdependent binaural cues, namely interaural time and level differences. Multi-conditional training is performed to incorporate the uncertainty of these binaural cues resulting from reverberation and the presence of competing sound sources. Second, a speech detection module that exploits the distinct spectral characteristics of speech and noise signals is developed to automatically select azimuthal positions that are likely to correspond to speech sources. Due to the established link between the localization stage and the recognition stage, which is realized by the speech detection module, the proposed binaural scene analyzer is able to selectively focus on a predefined number of speech sources that are positioned at unknown spatial locations, while ignoring interfering noise sources emerging from other spatial directions. Third, the speaker identities of all detected speech sources are recognized in the final stage of the model. To reduce the impact of environmental noise on the speaker recognition performance, a missing data classifier is combined with the adaptation of speaker models using a universal background model. This combination is particularly beneficial in nonstationary background noise

    Sensory Communication

    Get PDF
    Contains table of contents for Section 2, an introduction and reports on fourteen research projects.National Institutes of Health Grant RO1 DC00117National Institutes of Health Grant RO1 DC02032National Institutes of Health/National Institute on Deafness and Other Communication Disorders Grant R01 DC00126National Institutes of Health Grant R01 DC00270National Institutes of Health Contract N01 DC52107U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-95-K-0014U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0003U.S. Navy - Office of Naval Research Grant N00014-96-1-0379U.S. Air Force - Office of Scientific Research Grant F49620-95-1-0176U.S. Air Force - Office of Scientific Research Grant F49620-96-1-0202U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-96-K-0002National Institutes of Health Grant R01-NS33778U.S. Navy - Office of Naval Research Grant N00014-92-J-184

    Effects of Coordinated Bilateral Hearing Aids and Auditory Training on Sound Localization

    Get PDF
    This thesis has two main objectives: 1) evaluating the benefits of the bilateral coordination of the hearing aid Digital Signal Processing (DSP) features by measuring and comparing the auditory performance with and without the activation of this coordination, and 2) evaluating the benefits of acclimatization and auditory training on such auditory performance and, determining whether receiving training in one aspect of auditory performance (sound localization) would generalize to an improvement in another aspect of auditory performance (speech intelligibility in noise), and to what extent. Two studies were performed. The first study evaluated the speech intelligibility in noise and horizontal sound localization abilities in HI listeners using hearing aids that apply bilateral coordination of WDRC. A significant improvement was noted in sound localization with bilateral coordination on when compared to off, while speech intelligibility in noise did not seem to be affected. The second study was an extension of the first study, with a suitable period for acclimatization provided and then the participants were divided into training and control groups. Only the training group received auditory training. The training group performance was significantly better than the control group performance in some conditions, in both the speech intelligibility and the localization tasks. The bilateral coordination did not have significant effects on the results of the second study. This work is among the early literature to investigate the impact of bilateral coordination in hearing aids on the users’ auditory performance. Also, this work is the first to demonstrate the effect of auditory training in sound localization on the speech intelligibility performance

    Three-dimensional point-cloud room model in room acoustics simulations

    Get PDF
    • …
    corecore