997 research outputs found

    Capturing and Reproducing Spatial Audio Based on a Circular Microphone Array

    Get PDF

    The Far-Field Equatorial Array for Binaural Rendering

    Get PDF
    We present a method for obtaining a spherical harmonic representation of a sound field based on a microphone array along the equator of a rigid spherical scatterer. The two-dimensional plane wave de-composition of the incoming sound field is computed from the microphone signals. The influence of the scatterer is removed under the assumption of distant sound sources, and the result is converted to a spherical harmonic (SH) representation, which in turn can be rendered binaurally. The approach requires an order of magnitude fewer microphones compared to conventional spherical arrays that operate at the same SH order at the expense of not being able to accurately represent non-horizontally-propagating sound fields. Although the scattering removal is not perfect at high frequencies at low harmonic orders, numerical evaluation demonstrates the effectiveness of the approach

    Binaural Rendering of Spherical Microphone Array Signals

    Get PDF
    The presentation of extended reality for consumer and professional applications requires major advancements in the capture and reproduction of its auditory component to provide a plausible listening experience. A spatial representation of the acoustic environment needs to be considered to allow for movement within or an interaction with the augmented or virtual reality. This thesis focuses on the application of capturing a real-world acoustic environment by means of a spherical microphone array with the subsequent head-tracked binaural reproduction to a single listener via headphones. The introduction establishes the fundamental concepts and relevant terminology for non-experts of the field. Furthermore, the specific challenges of the method due to spatial oversampling the sound field as well as physical limitations and imperfections of the microphone array are presented to the reader. The first objective of this thesis was to develop a software in the Python programming language, which is capable of performing all required computations for the acoustic rendering of the captured signals in real-time. The implemented processing pipeline was made publicly available under an open-source license. Secondly, specific parameters of the microphone array hardware as well as the rendering software that are required for a perceptually high reproduction quality have been identified and investigated by means of multiple user studies. Lastly, the results provide insights into how unwanted additive noise components in the captured microphone signals from different spherical array configurations contribute to the reproduced ear signals

    Near-field evaluation of reproducible speech sources

    Get PDF
    The spatial speech reproduction capabilities of a KEMAR mouth simulator, a loudspeaker, the piston on the sphere model, and a circular harmonic fitting are evaluated in the near-field. The speech directivity of 24 human subjects, both male and female, is measured using a semicircular microphone array with a radius of 36.5 cm in the horizontal plane. Impulse responses are captured for the two devices, and filters are generated for the two numerical models to emulate their directional effect on speech reproduction. The four repeatable speech sources are evaluated through comparison to the recorded human speech both objectively, through directivity pattern and spectral magnitude differences, and subjectively, through a listening test on perceived coloration. Results show that the repeatable sources perform relatively well under the metric of directivity, but irregularities in their directivity patterns introduce audible coloration for off-axis directions.Peer reviewe

    End-to-End Magnitude Least Squares Binaural Rendering for Equatorial Microphone Arrays

    Get PDF
    We recently presented an end-to-end magnitude least squares (eMagLS) binaural rendering method for spherical microphone array (SMA) signals that integrates a comprehensive array model into a magnitude least squares objective to minimize reproduction errors. The introduced signal model addresses impairments due to practical limitations of spherical harmonic (SH) domain rendering, namely, spatial aliasing, truncation of the SH decomposition order, and regularized radial filtering. In this work, we improve the processing model when applied to the recently proposed equatorial microphone array (EMA) to facilitate three degrees-of-freedom head rotations during the rendering. EMAs provide similar accuracy to SMAs for sound fields from sources inside the horizontal plane while requiring a much lower number of microphones. We compare the proposed end-to-end renderers for both array types against a given binaural reference magnitude response. In addition to anechoic array simulations, the evaluation includes measured array room impulse responses to show the method’s effectiveness in minimizing high-frequency magnitude errors for all head orientations from SMAs and EMAs under practical room conditions. The published reference implementation of the method has been refined and now includes the solution for EMAs

    Measurement of head-related transfer functions : A review

    Get PDF
    A head-related transfer function (HRTF) describes an acoustic transfer function between a point sound source in the free-field and a defined position in the listener's ear canal, and plays an essential role in creating immersive virtual acoustic environments (VAEs) reproduced over headphones or loudspeakers. HRTFs are highly individual, and depend on directions and distances (near-field HRTFs). However, the measurement of high-density HRTF datasets is usually time-consuming, especially for human subjects. Over the years, various novel measurement setups and methods have been proposed for the fast acquisition of individual HRTFs while maintaining high measurement accuracy. This review paper provides an overview of various HRTF measurement systems and some insights into trends in individual HRTF measurements

    Reconstructing the Dynamic Directivity of Unconstrained Speech

    Full text link
    This article presents a method for estimating and reconstructing the spatial energy distribution pattern of natural speech, which is crucial for achieving realistic vocal presence in virtual communication settings. The method comprises two stages. First, recordings of speech captured by a real, static microphone array are used to create an egocentric virtual array that tracks the movement of the speaker over time. This virtual array is used to measure and encode the high-resolution directivity pattern of the speech signal as it evolves dynamically with natural speech and movement. In the second stage, the encoded directivity representation is utilized to train a machine learning model that can estimate the full, dynamic directivity pattern given a limited set of speech signals, such as those recorded using the microphones on a head-mounted display. Our results show that neural networks can accurately estimate the full directivity pattern of natural, unconstrained speech from limited information. The proposed method for estimating and reconstructing the spatial energy distribution pattern of natural speech, along with the evaluation of various machine learning models and training paradigms, provides an important contribution to the development of realistic vocal presence in virtual communication settings.Comment: In proceedings of I3DA 2023 - The 2023 International Conference on Immersive and 3D Audio. DOI coming soo
    • …
    corecore