375 research outputs found

    On the plausibility of simplified acoustic room representations for listener translation in dynamic binaural auralizations

    Get PDF
    Diese Doktorarbeit untersucht die Wahrnehmung vereinfachter akustischer RaumreprĂ€sentationen in positionsdynamischer Binauralwiedergabe fĂŒr die Hörertranslation. Die dynamische Binauralsynthese ist eine Audiowiedergabemethode zur Erzeugung rĂ€umlicher auditiver Illusionen ĂŒber Kopfhörer fĂŒr virtuelle, erweiterte und gemischte RealitĂ€t (VR/AR/MR). Dabei ist es nun eine typische Anforderung, immersive Inhalte in sechs Freiheitsgraden (6DOF) zu erkunden. Dynamische binaurale Schallfeldimitationen mit hoher physikalischer Genauigkeit zu realisieren, ist meist mit sehr hohem Rechenaufwand verbunden. FrĂŒhere psychoakustische Studien weisen jedoch darauf hin, dass Menschen eine begrenzte Empfindlichkeit gegenĂŒber den Details des Schallfelds haben, insbesondere im spĂ€ten Nachhall. Dies birgt das Potential physikalischer Vereinfachungen bei der positionsdynamischen Auralisation von RĂ€umen. Beispielsweise wurden Konzepte vorgeschlagen, die auf der perzeptiven Mixing Time oder der Hörbarkeitsschwelle von frĂŒhen Reflexionen basieren, fĂŒr welche jedoch eine grĂŒndliche psychoakustische Bewertung noch aussteht. ZunĂ€chst wurde ein Aufbau zur positionsdynamischen Raumauralisation implementiert und evaluiert. Daran untersucht die Arbeit wesentliche Systemparameter wie die erforderliche rĂ€umliche Auflösung eines Positionsrasters fĂŒr die dynamische Anpassung. Da allgemein etablierte Testmethoden zur wahrnehmungsbezogenen Bewertung von rĂ€umlichen auditiven Illusionen unter BerĂŒcksichtigung interaktiver Hörertranslation fehlten, untersucht die Arbeit verschiedene AnsĂ€tze zur Messung der PlausibilitĂ€t. Auf dieser Grundlage werden physikalische Vereinfachungen im Verlauf des Schallfeldes in positionsdynamischen binauralen Auralisationen der Raumakustik untersucht. FĂŒr die Hauptexperimente wurden binaurale Raumimpulsantworten (BRIRs) entlang einer Linie fĂŒr die Hörertranslation in einem eher trockenen Hörlabor und einem halligen Seminarraum Ă€hnlicher GrĂ¶ĂŸe gemessen. Die erstellten DatensĂ€tze enthalten Szenarien von Hörerbewegungen auf eine virtuelle Schallquelle zu, daran vorbei, davon weg oder dahinter. DarĂŒber hinaus betrachten die Untersuchungen zwei ExtremfĂ€lle der Quellenorientierung, um die Auswirkungen einer Variation der Schallquellenrichtcharakteristik zu berĂŒcksichtigen. Die BRIR-SĂ€tze werden systematisch bearbeitet und vereinfacht, um die Auswirkungen auf die Wahrnehmung zu bewerten. Insbesondere das Konzept der perzeptiven Mixing Time und manipulierte rĂ€umlich-zeitliche Muster frĂŒher Reflexionen dienten als TestfĂ€lle in den psychoakustischen Studien. Die Ergebnisse zeigen ein hohes Potential fĂŒr Vereinfachungen, unterstreichen aber auch die Relevanz der genauen Imitation prominenter frĂŒher Reflexionen. Die Ergebnisse bestĂ€tigen auch das Konzept der wahrnehmungsbezogenen Mixing Time fĂŒr die betrachteten FĂ€lle der positionsdynamischen binauralen Wiedergabe. Die Beobachtungen verdeutlichen, dass gĂ€ngige Testszenarien fĂŒr Auralisierungen, Interpolation und Extrapolation nicht kritisch genug sind, um allgemeine Schlussfolgerungen ĂŒber die Eignung der getesteten Rendering-AnsĂ€tze zu ziehen. Die Arbeit zeigt LösungsansĂ€tze auf.This thesis investigates the effect of simplified acoustic room representations in position-dynamic binaural audio for listener translation. Dynamic binaural synthesis is an audio reproduction method to create spatial auditory illusions over headphones for virtual, augmented, and mixed reality (AR/VR/MR). It has become a typical demand to explore immersive content in six degrees of freedom (6DOF). Realizing dynamic binaural sound field imitations with high physical accuracy requires high computational effort. However, previous psychoacoustic research indicates that humans have limited sensitivity to the details of the sound field. This fact bears the potential to simplify the physics in position-dynamic room auralizations. For example, concepts based on the perceptual mixing time or the audibility threshold of early reflections have been proposed. This thesis investigates the effect of simplified acoustic room representations in position-dynamic binaural audio for listener translation. First, a setup for position dynamic binaural room auralization was implemented and evaluated. Essential system parameters like the required position grid resolution for the audio reproduction were examined. Due to the lack of generally established test methods for the perceptual evaluation of spatial auditory illusions considering interactive listener translation, this thesis explores different approaches for measuring plausibility. Based on this foundation, this work examines physical impairments and simplifications in the progress of the sound field in position dynamic binaural auralizations of room acoustics. For the main experiments, sets of binaural room impulse responses (BRIRs) were measured along a line for listener translation in a relatively dry listening laboratory and a reverberant seminar room of similar size. These sets include scenarios of walking towards a virtual sound source, past it, away from it, or behind it. The consideration of two extreme cases of source orientation took into account the effects of variations in directivity. The BRIR sets were systematically impaired and simplified to evaluate the perceptual effects. Especially the concept of the perceptual mixing time and manipulated spatiotemporal patterns of early reflections served as test cases. The results reveal a high potential for simplification but also underline the relevance of accurately imitating prominent early reflections. The findings confirm the concept of the perceptual mixing time for the considered cases of position-dynamic binaural audio. The observations highlight that common test scenarios for dynamic binaural rendering approaches are not sufficiently critical to draw general conclusions about their suitability. This thesis proposes strategies to solve this

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    A High Resolution and Full-Spherical Head-Related Transfer Function Database for Different Head-Above-Torso Orientations

    Get PDF
    Head-related transfer functions (HRTFs) capture the free-field sound transmission from a sound source to the listeners ears, incorporating all the cues for sound localization, such as interaural time and level differences as well as the spectral cues that originate from scattering, diffraction, and reflection on the human pinnae, head, and body. In this study, HRTFs were acoustically measured and numerically simulated for the FABIAN head-and-torso simulator on a full-spherical and high-resolution sampling grid. HRTFs were acquired for 11 horizontal head-above-torso orientations, covering the typical range of motion of +/-50°. This made it possible to account for head movements in dynamic binaural auralizations. Because of a lack of an external reference for the HRTFs, measured and simulated data sets were cross-validated by applying auditory models for localization performance and spectral coloration. The results indicate a high degree of similarity between the two data sets regarding all tested aspects, thus suggesting that they are free of systematic errors

    Tools for urban sound quality assessment

    Get PDF

    Ambisonics

    Get PDF
    This open access book provides a concise explanation of the fundamentals and background of the surround sound recording and playback technology Ambisonics. It equips readers with the psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. The book’s introductory section offers a perspective on Ambisonics spanning from the origins of coincident recordings in the 1930s to the Ambisonic concepts of the 1970s, as well as classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced since the 1980s. As, from time to time, the underlying mathematics become quite involved, but should be comprehensive without sacrificing readability, the book includes an extensive mathematical appendix. The book offers readers a deeper understanding of Ambisonic technologies, and will especially benefit scientists, audio-system and audio-recording engineers. In the advanced sections of the book, fundamentals and modern techniques as higher-order Ambisonic decoding, 3D audio effects, and higher-order recording are explained. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material

    High Frequency Reproduction in Binaural Ambisonic Rendering

    Get PDF
    Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis. This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements

    Movements in Binaural Space: Issues in HRTF Interpolation and Reverberation, with applications to Computer Music

    Get PDF
    This thesis deals broadly with the topic of Binaural Audio. After reviewing the literature, a reappraisal of the minimum-phase plus linear delay model for HRTF representation and interpolation is offered. A rigorous analysis of threshold based phase unwrapping is also performed. The results and conclusions drawn from these analyses motivate the development of two novel methods for HRTF representation and interpolation. Empirical data is used directly in a Phase Truncation method. A Functional Model for phase is used in the second method based on the psychoacoustical nature of Interaural Time Differences. Both methods are validated; most significantly, both perform better than a minimum-phase method in subjective testing. The accurate, artefact-free dynamic source processing afforded by the above methods is harnessed in a binaural reverberation model, based on an early reflection image model and Feedback Delay Network diffuse field, with accurate interaural coherence. In turn, these flexible environmental processing algorithms are used in the development of a multi-channel binaural application, which allows the audition of multi-channel setups in headphones. Both source and listener are dynamic in this paradigm. A GUI is offered for intuitive use of the application. HRTF processing is thus re-evaluated and updated after a review of accepted practice. Novel solutions are presented and validated. Binaural reverberation is recognised as a crucial tool for convincing artificial spatialisation, and is developed on similar principles. Emphasis is placed on transparency of development practices, with the aim of wider dissemination and uptake of binaural technology

    Mixed Structural Models for 3D Audio in Virtual Environments

    Get PDF
    In the world of ICT, strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of the new technology by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but a few. The concurrent presence of multimodal senses and activities make multimodal virtual environments potentially flexible and adaptive, allowing users to switch between modalities as needed during the continuously changing conditions of use situation. Augmentation through additional modalities and sensory substitution techniques are compelling ingredients for presenting information non-visually, when the visual bandwidth is overloaded, when data are visually occluded, or when the visual channel is not available to the user (e.g., for visually impaired people). Multimodal systems for the representation of spatial information will largely benefit from the implementation of audio engines that have extensive knowledge of spatial hearing and virtual acoustics. Models for spatial audio can provide accurate dynamic information about the relation between the sound source and the surrounding environment, including the listener and his/her body which acts as an additional filter. Indeed, this information cannot be substituted by any other modality (i.e., visual or tactile). Nevertheless, today's spatial representation of audio within sonification tends to be simplistic and with poor interaction capabilities, being multimedia systems currently focused on graphics processing mostly, and integrated with simple stereo or multi-channel surround-sound. On a much different level lie binaural rendering approaches based on headphone reproduction, taking into account that possible disadvantages (e.g. invasiveness, non-flat frequency responses) are counterbalanced by a number of desirable features. Indeed, these systems might control and/or eliminate reverberation and other acoustic effects of the real listening space, reduce background noise, and provide adaptable and portable audio displays, which are all relevant aspects especially in enhanced contexts. Most of the binaural sound rendering techniques currently exploited in research rely on the use of Head-Related Transfer Functions (HRTFs), i.e. peculiar filters that capture the acoustic effects of the human head and ears. HRTFs allow loyal simulation of the audio signal that arrives at the entrance of the ear canal as a function of the sound source's spatial position. HRTF filters are usually presented under the form of acoustic signals acquired on dummy heads built according to mean anthropometric measurements. Nevertheless, anthropometric features of the human body have a key role in HRTF shaping: several studies have attested how listening to non-individual binaural sounds results in evident localization errors. On the other hand, individual HRTF measurements on a significant number of subjects result both time- and resource-expensive. Several techniques for synthetic HRTF design have been proposed during the last two decades and the most promising one relies on structural HRTF models. In this revolutionary approach, the most important effects involved in spatial sound perception (acoustic delays and shadowing due to head diffraction, reflections on pinna contours and shoulders, resonances inside the ear cavities) are isolated and modeled separately with a corresponding filtering element. HRTF selection and modeling procedures can be determined by physical interpretation: parameters of each rendering blocks or selection criteria can be estimated from real and simulated data and related to anthropometric geometries. Effective personal auditory displays represent an innovative breakthrough for a plethora of applications and structural approach can also allow for effective scalability depending on the available computational resources or bandwidth. Scenes with multiple highly realistic audiovisual objects are easily managed exploiting parallelism of increasingly ubiquitous GPUs (Graphics Processing Units). Building individual headphone equalization with perceptually robust inverse filtering techniques represents a fundamental step towards the creation of personal virtual auditory displays (VADs). To this regard, several examples might benefit from these considerations: multi-channel downmix over headphones, personal cinema, spatial audio rendering in mobile devices, computer-game engines and individual binaural audio standards for movie and music production. This thesis presents a family of approaches that overcome the current limitations of headphone-based 3D audio systems, aiming at building personal auditory displays through structural binaural audio models for an immersive sound reproduction. The resulting models allow for an interesting form of content adaptation and personalization, since they include parameters related to the user's anthropometry in addition to those related to the sound sources and the environment. The covered research directions converge to a novel framework for synthetic HRTF design and customization that combines the structural modeling paradigm with other HRTF selection techniques (inspired by non-individualized HRTF selection procedures) and represents the main novel contribution of this thesis: the Mixed Structural Modeling (MSM) approach considers the global HRTF as a combination of structural components, which can be chosen to be either synthetic or recorded components. In both cases, customization is based on individual anthropometric data, which are used to either fit the model parameters or to select a measured/simulated component within a set of available responses. The definition and experimental validation of the MSM approach addresses several pivotal issues towards the acquisition and delivery of binaural sound scenes and designing guidelines for personalized 3D audio virtual environments holding the potential of novel forms of customized communication and interaction with sound and music content. The thesis also presents a multimodal interactive system which is used to conduct subjective test on multi-sensory integration in virtual environments. Four experimental scenarios are proposed in order to test the capabilities of auditory feedback jointly to tactile or visual modalities. 3D audio feedback related to user’s movements during simple target following tasks is tested as an applicative example of audio-visual rehabilitation system. Perception of direction of footstep sounds interactively generated during walking and provided through headphones highlights how spatial information can clarify the semantic congruence between movement and multimodal feedback. A real time, physically informed audio-tactile interactive system encodes spatial information in the context of virtual map presentation with particular attention to orientation and mobility (O&M) learning processes addressed to visually impaired people. Finally, an experiment analyzes the haptic estimation of size of a virtual 3D object (a stair-step) whereas the exploration is accompanied by a real-time generated auditory feedback whose parameters vary as a function of the height of the interaction point. The collected data from these experiments suggest that well-designed multimodal feedback, exploiting 3D audio models, can definitely be used to improve performance in virtual reality and learning processes in orientation and complex motor tasks, thanks to the high level of attention, engagement, and presence provided to the user. The research framework, based on the MSM approach, serves as an important evaluation tool with the aim of progressively determining the relevant spatial attributes of sound for each application domain. In this perspective, such studies represent a novelty in the current literature on virtual and augmented reality, especially concerning the use of sonification techniques in several aspects of spatial cognition and internal multisensory representation of the body. This thesis is organized as follows. An overview of spatial hearing and binaural technology through headphones is given in Chapter 1. Chapter 2 is devoted to the Mixed Structural Modeling formalism and philosophy. In Chapter 3, topics in structural modeling for each body component are studied, previous research and two new models, i.e. near-field distance dependency and external-ear spectral cue, are presented. Chapter 4 deals with a complete case study of the mixed structural modeling approach and provides insights about the main innovative aspects of such modus operandi. Chapter 5 gives an overview of number of a number of proposed tools for the analysis and synthesis of HRTFs. System architectural guidelines and constraints are discussed in terms of real-time issues, mobility requirements and customized audio delivery. In Chapter 6, two case studies investigate the behavioral importance of spatial attribute of sound and how continuous interaction with virtual environments can benefit from using spatial audio algorithms. Chapter 7 describes a set of experiments aimed at assessing the contribution of binaural audio through headphones in learning processes of spatial cognitive maps and exploration of virtual objects. Finally, conclusions are drawn and new research horizons for further work are exposed in Chapter 8
    • 

    corecore