159 research outputs found

    Acoustic sensor network geometry calibration and applications

    Get PDF
    In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods

    Human Machine Interfaces for Teleoperators and Virtual Environments

    Get PDF
    In Mar. 1990, a meeting organized around the general theme of teleoperation research into virtual environment display technology was conducted. This is a collection of conference-related fragments that will give a glimpse of the potential of the following fields and how they interplay: sensorimotor performance; human-machine interfaces; teleoperation; virtual environments; performance measurement and evaluation methods; and design principles and predictive models

    Computational Audiovisual Scene Analysis

    Get PDF
    Yan R. Computational Audiovisual Scene Analysis. Bielefeld: Universitätsbibliothek Bielefeld; 2014.In most real-world situations, a robot is interacting with multiple people. In this case, understanding of the dialogs is essential. However, dialog scene analysis is missing in most existing systems of human-robot interaction. In such systems, only one speaker can talk with the robot or each speaker wears an attached microphone or a headset. The target of Computational AudioVisual Scene Analysis (CAVSA) is therefore making dialogs between humans and robots more natural and flexible. The CAVSA system is able to learn how many speakers are in the scenario, where the speakers are and who is currently speaking. CAVSA is a challenging task due to the complexity of dialogue scenarios. First, speakers are unknown in advance, thus a database for training high-level features beforehand to recognize faces or voices is not available. Second, people can dynamically come into and leave the scene, may move all the time and even change their locations outside the camera field of view. Third, the robot can not see all the people at the same time due to limited camera field of view and head movements. Moreover, a sound could be related to a person who stands outside the camera field of view and has never been seen. I will show that the CAVSA system is able to assign words to corresponding speakers. A speaker is recognized again when he leaves and enters the scene, or changes his position even with a newly appearing person

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Spatial Hearing with Simultaneous Sound Sources: A Psychophysical Investigation

    Get PDF
    This thesis provides an overview of work conducted to investigate human spatial hearing in situations involving multiple concurrent sound sources. Much is known about spatial hearing with single sound sources, including the acoustic cues to source location and the accuracy of localisation under different conditions. However, more recently interest has grown in the behaviour of listeners in more complex environments. Concurrent sound sources pose a particularly difficult problem for the auditory system, as their identities and locations must be extracted from a common set of sensory receptors and shared computational machinery. It is clear that humans have a rich perception of their auditory world, but just how concurrent sounds are processed, and how accurately, are issues that are poorly understood. This work attempts to fill a gap in our understanding by systematically examining spatial resolution with multiple sound sources. A series of psychophysical experiments was conducted on listeners with normal hearing to measure performance in spatial localisation and discrimination tasks involving more than one source. The general approach was to present sources that overlapped in both frequency and time in order to observe performance in the most challenging of situations. Furthermore, the role of two primary sets of location cues in concurrent source listening was probed by examining performance in different spatial dimensions. The binaural cues arise due to the separation of the two ears, and provide information about the lateral position of sound sources. The spectral cues result from location-dependent filtering by the head and pinnae, and allow vertical and front-rear auditory discrimination. Two sets of experiments are described that employed relatively simple broadband noise stimuli. In the first of these, two-point discrimination thresholds were measured using simultaneous noise bursts. It was found that the pair could be resolved only if a binaural difference was present; spectral cues did not appear to be sufficient. In the second set of experiments, the two stimuli were made distinguishable on the basis of their temporal envelopes, and the localisation of a designated target source was directly examined. Remarkably robust localisation was observed, despite the simultaneous masker, and both binaural and spectral cues appeared to be of use in this case. Small but persistent errors were observed, which in the lateral dimension represented a systematic shift away from the location of the masker. The errors can be explained by interference in the processing of the different location cues. Overall these experiments demonstrated that the spatial perception of concurrent sound sources is highly dependent on stimulus characteristics and configurations. This suggests that the underlying spatial representations are limited by the accuracy with which acoustic spatial cues can be extracted from a mixed signal. Three sets of experiments are then described that examined spatial performance with speech, a complex natural sound. The first measured how well speech is localised in isolation. This work demonstrated that speech contains high-frequency energy that is essential for accurate three-dimensional localisation. In the second set of experiments, spatial resolution for concurrent monosyllabic words was examined using similar approaches to those used for the concurrent noise experiments. It was found that resolution for concurrent speech stimuli was similar to resolution for concurrent noise stimuli. Importantly, listeners were limited in their ability to concurrently process the location-dependent spectral cues associated with two brief speech sources. In the final set of experiments, the role of spatial hearing was examined in a more relevant setting containing concurrent streams of sentence speech. It has long been known that binaural differences can aid segregation and enhance selective attention in such situations. The results presented here confirmed this finding and extended it to show that the spectral cues associated with different locations can also contribute. As a whole, this work provides an in-depth examination of spatial performance in concurrent source situations and delineates some of the limitations of this process. In general, spatial accuracy with concurrent sources is poorer than with single sound sources, as both binaural and spectral cues are subject to interference. Nonetheless, binaural cues are quite robust for representing concurrent source locations, and spectral cues can enhance spatial listening in many situations. The findings also highlight the intricate relationship that exists between spatial hearing, auditory object processing, and the allocation of attention in complex environments

    Three-dimensional point-cloud room model in room acoustics simulations

    Get PDF

    Sonic Interactions in Virtual Environments

    Get PDF
    This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments

    Acoustical measurements on stages of nine U.S. concert halls

    Get PDF

    Sonic Interactions in Virtual Environments

    Get PDF
    • …
    corecore