77 research outputs found

    Pyroomacoustics: A Python package for audio room simulations and array processing algorithms

    Full text link
    We present pyroomacoustics, a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components: an intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms; a fast C implementation of the image source model for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers; and finally, reference implementations of popular algorithms for beamforming, direction finding, and adaptive filtering. Together, they form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step.Comment: 5 pages, 5 figures, describes a software packag

    Perceptual and Room Acoustical Evaluation of a Computational Efficient Binaural Room Impulse Response Simulation Method

    Get PDF
    A fast and perceptively plausible method for synthesizing binaural room impulse responses (BRIR) is presented. The method is principally suited for application in dynamic and interactive evaluation environments (e. g., for hearing aid development), psychophysics with adaptively changing room reverberation, or simulation and computer games. In order to achieve a low computational cost, the proposed method is based on a hybrid approach. Using the image source model (ISM; Allen and Berkley [J.Acoust. Soc. Am. Vol. 66(4), 1979]), early reflections are computed in a geometrically exact way, taking into account source and listener positions as well as wall absorption and room geometry approximated by a “shoebox”. The ISM is restricted to a low order and the reverberant tail is generated by a feedback delay network (FDN; Jot and Chaigne [Proc. 90th AES Conv., 1991]), which offers the advantages of a low computational complexity on the one hand and an explicit control of the frequency dependent decay characteristics on the other hand. The FDN approach was extended, taking spatial room properties into account such as room dimensions and different absorption characteristics of the walls. Moreover, the listener orientation and position in the room is considered to achieve a realistic spatial reverberant field. Technical and subjective evaluations were performed by comparing measured and synthesized BRIRs for various rooms. Mostly, a high accuracy both for some common room acoustical parameters and subjective sound properties was found. In addition, an analysis will be presented of several methods to include room geometry in the FDN.DFG, FOR 1732, Individualisierte Hörakustik: Modelle, Algorithmen und Systeme für die Sicherstellung der akustischen Wahrnehmung für alle in allen SituationenDFG, EXC 1077/1, Hören für alle: Modelle, Technologien und Lösungsansätze für Diagnostik, Wiederherstellung und Unterstützung des Hören

    GWA: A Large High-Quality Acoustic Dataset for Audio Processing

    Full text link
    We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audio dataset of over 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations. Our dataset samples acoustic environments from over 6.8K high-quality diverse and professionally designed houses represented as semantically labeled 3D meshes. We also present a novel real-world acoustic materials assignment scheme based on semantic matching that uses a sentence transformer model. We compute high-quality impulse responses corresponding to accurate low-frequency and high-frequency wave effects by automatically calibrating geometric acoustic ray-tracing with a finite-difference time-domain wave solver. We demonstrate the higher accuracy of our IRs by comparing with recorded IRs from complex real-world environments. The code and the full dataset will be released at the time of publication. Moreover, we highlight the benefits of GWA on audio deep learning tasks such as automated speech recognition, speech enhancement, and speech separation. We observe significant improvement over prior synthetic IR datasets in all tasks due to using our dataset.Comment: Project webpage https://gamma.umd.edu/pro/sound/gw

    Implementation and Perceptual Evaluation of a Simulation Method for Coupled Rooms in Higher Order Ambisonics

    Get PDF
    A fast and perceptively plausible method for rendering acoustic scenarios with moving sources and moving listeners is presented. The method is principally suited for application in dynamic and interactive evaluation environments (e.g., for hearing aid development), psycho-physics with adaptively changing the spatial configuration, or simulation and computer games. The simulation distinguishes between the direct sound, sound reflected and diffracted by objects of limited size, diffuse sound surrounding the listener, e.g., diffuse background sounds and diffuse reverberation, and ’radiating holes’ for simulation of coupled adjacent rooms. Instead of providing its own simulation of room reverberation, the proposed simulation method generates appropriate output signals for external room reverberation simulators (e.g., see contribution by Wendt et al.). The output of such room reverberation simulators is then taken either as diffuse surrounding sound if the listener position is within the simulated room, or as input into a ’radiating hole’, if the listener is in an adjacent room. Subjective evaluations are performed by comparing measured and synthesized transitions between coupled rooms.DFG, FOR 1732, Individualisierte Hörakustik: Modelle, Algorithmen und Systeme für die Sicherstellung der akustischen Wahrnehmung für alle in allen Situatione

    SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning

    Full text link
    We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments. Given a 3D mesh of a real-world environment, SoundSpaces can generate highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations. Together with existing 3D visual assets, it supports an array of audio-visual research tasks, such as audio-visual navigation, mapping, source localization and separation, and acoustic matching. Compared to existing resources, SoundSpaces 2.0 has the advantages of allowing continuous spatial sampling, generalization to novel environments, and configurable microphone and material properties. To our knowledge, this is the first geometry-based acoustic simulation that offers high fidelity and realism while also being fast enough to use for embodied learning. We showcase the simulator's properties and benchmark its performance against real-world audio measurements. In addition, we demonstrate two downstream tasks -- embodied navigation and far-field automatic speech recognition -- and highlight sim2real performance for the latter. SoundSpaces 2.0 is publicly available to facilitate wider research for perceptual systems that can both see and hear.Comment: Camera-ready version. Website: https://soundspaces.org. Project page: https://vision.cs.utexas.edu/projects/soundspaces

    Comparison of Acoustic Simulation Tools for Shoebox-Shaped Rooms

    Full text link
    [ES] - Investigar las herramientas disponibles de simulación de salas basadas en Matlab / Investigar otras herramientas libres de simulación de salas - Crear / Definir escenarios de prueba - Realización de la implementación / Simulaciones - Evaluar los resultados[EN] - Investigate the available Matlab-based Room simulation tools / Investigate other free Room simulation tools - Create / Define test scenarios - Perform Implementation / Simulations - Evaluate ResultsSapiña Cárcel, J. (2020). Comparación de herramientas de simulación acústica para salas con forma de caja de zapatos. Universitat Politècnica de València. http://hdl.handle.net/10251/179554TFG

    Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators

    Full text link
    Blind acoustic parameter estimation consists in inferring the acoustic properties of an environment from recordings of unknown sound sources. Recent works in this area have utilized deep neural networks trained either partially or exclusively on simulated data, due to the limited availability of real annotated measurements. In this paper, we study whether a model purely trained using a fast image-source room impulse response simulator can generalize to real data. We present an ablation study on carefully crafted simulated training sets that account for different levels of realism in source, receiver and wall responses. The extent of realism is controlled by the sampling of wall absorption coefficients and by applying measured directivity patterns to microphones and sources. A state-of-the-art model trained on these datasets is evaluated on the task of jointly estimating the room's volume, total surface area, and octave-band reverberation times from multiple, multichannel speech recordings. Results reveal that every added layer of simulation realism at train time significantly improves the estimation of all quantities on real signals

    Interactive physically-based sound simulation

    Get PDF
    The realization of interactive, immersive virtual worlds requires the ability to present a realistic audio experience that convincingly compliments their visual rendering. Physical simulation is a natural way to achieve such realism, enabling deeply immersive virtual worlds. However, physically-based sound simulation is very computationally expensive owing to the high-frequency, transient oscillations underlying audible sounds. The increasing computational power of desktop computers has served to reduce the gap between required and available computation, and it has become possible to bridge this gap further by using a combination of algorithmic improvements that exploit the physical, as well as perceptual properties of audible sounds. My thesis is a step in this direction. My dissertation concentrates on developing real-time techniques for both sub-problems of sound simulation: synthesis and propagation. Sound synthesis is concerned with generating the sounds produced by objects due to elastic surface vibrations upon interaction with the environment, such as collisions. I present novel techniques that exploit human auditory perception to simulate scenes with hundreds of sounding objects undergoing impact and rolling in real time. Sound propagation is the complementary problem of modeling the high-order scattering and diffraction of sound in an environment as it travels from source to listener. I discuss my work on a novel numerical acoustic simulator (ARD) that is hundred times faster and consumes ten times less memory than a high-accuracy finite-difference technique, allowing acoustic simulations on previously intractable spaces, such as a cathedral, on a desktop computer. Lastly, I present my work on interactive sound propagation that leverages my ARD simulator to render the acoustics of arbitrary static scenes for multiple moving sources and listener in real time, while accounting for scene-dependent effects such as low-pass filtering and smooth attenuation behind obstructions, reverberation, scattering from complex geometry and sound focusing. This is enabled by a novel compact representation that takes a thousand times less memory than a direct scheme, thus reducing memory footprints to within available main memory. To the best of my knowledge, this is the only technique and system in existence to demonstrate auralization of physical wave-based effects in real-time on large, complex 3D scenes

    Efficient geometric sound propagation using visibility culling

    Get PDF
    Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario
    • …
    corecore