493 research outputs found

    Source Separation for Hearing Aid Applications

    Get PDF

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Audio head pose estimation using the direct to reverberant speech ratio

    Get PDF
    Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have focussed on the use of visual information of a subject’s head. These visual approaches have a number of limitations such as, an inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a reverberant room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This method has the advantage of actually exploiting the reverberations within a room rather than trying to suppress them. This also has the practical advantage that most enclosed living spaces, such as meeting rooms or offices are highly reverberant environments. In order to test this hypothesis we also present a new data set featuring 56 subjects recorded in three different rooms, with different acoustic properties, adopting 8 different head poses in 4 different room positions captured with a 16 element microphone array. As far as the authors are aware this data set is unique and will make a significant contribution to further work in the area of audio head pose estimation. Using this data set we demonstrate that our proposed method of using the DRR for audio head pose estimation provides a significant improvement over previous methods

    Eigenbeamforming array systems for sound source localization

    Get PDF

    Spherical microphone array processing for acoustic parameter estimation and signal enhancement

    No full text
    In many distant speech acquisition scenarios, such as hands-free telephony or teleconferencing, the desired speech signal is corrupted by noise and reverberation. This degrades both the speech quality and intelligibility, making communication difficult or even impossible. Speech enhancement techniques seek to mitigate these effects and extract the desired speech signal. This objective is commonly achieved through the use of microphone arrays, which take advantage of the spatial properties of the sound field in order to reduce noise and reverberation. Spherical microphone arrays, where the microphones are arranged in a spherical configuration, usually mounted on a rigid baffle, are able to analyze the sound field in three dimensions; the captured sound field can then be efficiently described in the spherical harmonic domain (SHD). In this thesis, a number of novel spherical array processing algorithms are proposed, based in the SHD. In order to comprehensively evaluate these algorithms under a variety of conditions, a method is developed for simulating the acoustic impulse responses between a sound source and microphones positioned on a rigid spherical array placed in a reverberant environment. The performance of speech enhancement algorithms can often be improved by taking advantage of additional a priori information, obtained by estimating various acoustic parameters. Methods for estimating two such parameters, the direction of arrival (DOA) of a source (static or moving) and the signal-to-diffuse energy ratio, are introduced. Finally, the signals received by a microphone array can be filtered and summed by a beamformer. A tradeoff beamformer is proposed, which achieves a balance between speech distortion and noise reduction. The beamformer weights depend on the noise statistics, which cannot be directly observed and must be estimated. An estimation algorithm is developed for this purpose, exploiting the DOA estimates previously obtained to differentiate between desired and interfering coherent sources.Open Acces

    Effizientes binaurales Rendering von virtuellen akustischen RealitÀten : technische und wahrnehmungsbezogene Konzepte

    Get PDF
    Binaural rendering aims to immerse the listener in a virtual acoustic scene, making it an essential method for spatial audio reproduction in virtual or augmented reality (VR/AR) applications. The growing interest and research in VR/AR solutions yielded many different methods for the binaural rendering of virtual acoustic realities, yet all of them share the fundamental idea that the auditory experience of any sound field can be reproduced by reconstructing its sound pressure at the listener's eardrums. This thesis addresses various state-of-the-art methods for 3 or 6 degrees of freedom (DoF) binaural rendering, technical approaches applied in the context of headphone-based virtual acoustic realities, and recent technical and psychoacoustic research questions in the field of binaural technology. The publications collected in this dissertation focus on technical or perceptual concepts and methods for efficient binaural rendering, which has become increasingly important in research and development due to the rising popularity of mobile consumer VR/AR devices and applications. The thesis is organized into five research topics: Head-Related Transfer Function Processing and Interpolation, Parametric Spatial Audio, Auditory Distance Perception of Nearby Sound Sources, Binaural Rendering of Spherical Microphone Array Data, and Voice Directivity. The results of the studies included in this dissertation extend the current state of research in the respective research topic, answer specific psychoacoustic research questions and thereby yield a better understanding of basic spatial hearing processes, and provide concepts, methods, and design parameters for the future implementation of technically and perceptually efficient binaural rendering.Binaurales Rendering zielt darauf ab, dass der Hörer in eine virtuelle akustische Szene eintaucht, und ist somit eine wesentliche Methode fĂŒr die rĂ€umliche Audiowiedergabe in Anwendungen der virtuellen RealitĂ€t (VR) oder der erweiterten RealitĂ€t (AR – aus dem Englischen Augmented Reality). Das wachsende Interesse und die zunehmende Forschung an VR/AR-Lösungen fĂŒhrte zu vielen verschiedenen Methoden fĂŒr das binaurale Rendering virtueller akustischer RealitĂ€ten, die jedoch alle die grundlegende Idee teilen, dass das Hörerlebnis eines beliebigen Schallfeldes durch die Rekonstruktion seines Schalldrucks am Trommelfell des Hörers reproduziert werden kann. Diese Arbeit befasst sich mit verschiedenen modernsten Methoden zur binauralen Wiedergabe mit 3 oder 6 Freiheitsgraden (DoF – aus dem Englischen Degree of Freedom), mit technischen AnsĂ€tzen, die im Kontext kopfhörerbasierter virtueller akustischer RealitĂ€ten angewandt werden, und mit aktuellen technischen und psychoakustischen Forschungsfragen auf dem Gebiet der Binauraltechnik. Die in dieser Dissertation gesammelten Publikationen befassen sich mit technischen oder wahrnehmungsbezogenen Konzepten und Methoden fĂŒr effizientes binaurales Rendering, was in der Forschung und Entwicklung aufgrund der zunehmenden Beliebtheit von mobilen Verbraucher-VR/AR-GerĂ€ten und -Anwendungen zunehmend an Relevanz gewonnen hat. Die Arbeit ist in fĂŒnf Forschungsthemen gegliedert: Verarbeitung und Interpolation von AußenohrĂŒbertragungsfunktionen, parametrisches rĂ€umliches Audio, auditive Entfernungswahrnehmung ohrnaher Schallquellen, binaurales Rendering von sphĂ€rischen Mikrofonarraydaten und Richtcharakteristik der Stimme. Die Ergebnisse der in dieser Dissertation enthaltenen Studien erweitern den aktuellen Forschungsstand im jeweiligen Forschungsfeld, beantworten spezifische psychoakustische Forschungsfragen und fĂŒhren damit zu einem besseren VerstĂ€ndnis grundlegender rĂ€umlicher Hörprozesse, und liefern Konzepte, Methoden und Gestaltungsparameter fĂŒr die zukĂŒnftige Umsetzung eines technisch und wahrnehmungsbezogen effizienten binauralen Renderings.BMBF, 03FH014IX5, NatĂŒrliche raumbezogene Darbietung selbsterzeugter Schallereignisse in virtuellen auditiven Umgebungen (NarDasS

    Chasing the bird: 3D acoustic tracking of aerial flight displays with a minimal planar microphone array

    Get PDF
    Tracking the flight patterns of birds and bats in three-dimensional space is central to key questions in evolutionary ecology but remains a difficult technical challenge. For example, complex aerial flight displays are common among birds breeding in open habitats, but information on flight performance is limited. Here, we demonstrate the feasibility of using a large ground-based 4-microphone planar array to track the aerial flight displays of the cryptic Jack Snipe Lymnocryptes minimus. The main element of male display flights resembles a galloping horse at a distance. Under conditions of sufficient signal-to-noise ratio and of vertical alignment with the microphone array, we successfully tracked male snipe in 3D space for up to 25 seconds with a total flight path of 280 m. The ’gallop’ phase of male snipe dropped from ca. 141 to 64 m above ground at an average velocity of 77 km/h and up to 92 km/h. Our project is one of the first applications of bioacoustics to measure 3D flight paths of birds under field conditions, and our results were consistent with our visual observations. Our microphone array and postprocessing workflow provides a standardised protocol that could be used to collect comparative data on birds with complex aerial flight displays. Acoustic display; animal flight; flight tracking; Jack Snipe; Lymnocryptes minimus; microphone arraypublishedVersio

    Design and Evaluation of a Scalable and Reconfigurable Multi-Platform System for Acoustic Imaging

    Get PDF
    This paper proposes a scalable and multi-platform framework for signal acquisition and processing, which allows for the generation of acoustic images using planar arrays of MEMS (Micro-Electro-Mechanical Systems) microphones with low development and deployment costs. Acoustic characterization of MEMS sensors was performed, and the beam pattern of a module, based on an 8 × 8 planar array and of several clusters of modules, was obtained. A flexible framework, formed by an FPGA, an embedded processor, a computer desktop, and a graphic processing unit, was defined. The processing times of the algorithms used to obtain the acoustic images, including signal processing and wideband beamforming via FFT, were evaluated in each subsystem of the framework. Based on this analysis, three frameworks are proposed, defined by the specific subsystems used and the algorithms shared. Finally, a set of acoustic images obtained from sound reflected from a person are presented as a case study in the field of biometric identification. These results reveal the feasibility of the proposed systemSpanish research project SAM: TEC 2015-68170-R (MINECO/FEDER, UE

    Robust acoustic beamforming in the presence of channel propagation uncertainties

    No full text
    Beamforming is a popular multichannel signal processing technique used in conjunction with microphone arrays to spatially filter a sound field. Conventional optimal beamformers assume that the propagation channels between each source and microphone pair are a deterministic function of the source and microphone geometry. However in real acoustic environments, there are several mechanisms that give rise to unpredictable variations in the phase and amplitudes of the propagation channels. In the presence of these uncertainties the performance of beamformers degrade. Robust beamformers are designed to reduce this performance degradation. However, robust beamformers rely on tuning parameters that are not closely related to the array geometry. By modeling the uncertainty in the acoustic channels explicitly we can derive more accurate expressions for the source-microphone channel variability. As such we are able to derive beamformers that are well suited to the application of acoustics in realistic environments. Through experiments we validate the acoustic channel models and through simulations we show the performance gains of the associated robust beamformer. Furthermore, by modeling the speech short time Fourier transform coefficients we are able to design a beamformer framework in the power domain. By utilising spectral subtraction we are able to see performance benefits over ideal conventional beamformers. Including the channel uncertainties models into the weights design improves robustness.Open Acces
    • 

    corecore