4,951 research outputs found

    GWA: A Large High-Quality Acoustic Dataset for Audio Processing

    Full text link
    We present the Geometric-Wave Acoustic (GWA) dataset, a large-scale audio dataset of over 2 million synthetic room impulse responses (IRs) and their corresponding detailed geometric and simulation configurations. Our dataset samples acoustic environments from over 6.8K high-quality diverse and professionally designed houses represented as semantically labeled 3D meshes. We also present a novel real-world acoustic materials assignment scheme based on semantic matching that uses a sentence transformer model. We compute high-quality impulse responses corresponding to accurate low-frequency and high-frequency wave effects by automatically calibrating geometric acoustic ray-tracing with a finite-difference time-domain wave solver. We demonstrate the higher accuracy of our IRs by comparing with recorded IRs from complex real-world environments. The code and the full dataset will be released at the time of publication. Moreover, we highlight the benefits of GWA on audio deep learning tasks such as automated speech recognition, speech enhancement, and speech separation. We observe significant improvement over prior synthetic IR datasets in all tasks due to using our dataset.Comment: Project webpage https://gamma.umd.edu/pro/sound/gw

    Broadband DOA estimation using Convolutional neural networks trained with noise signals

    Full text link
    A convolution neural network (CNN) based classification method for broadband DOA estimation is proposed, where the phase component of the short-time Fourier transform coefficients of the received microphone signals are directly fed into the CNN and the features required for DOA estimation are learnt during training. Since only the phase component of the input is used, the CNN can be trained with synthesized noise signals, thereby making the preparation of the training data set easier compared to using speech signals. Through experimental evaluation, the ability of the proposed noise trained CNN framework to generalize to speech sources is demonstrated. In addition, the robustness of the system to noise, small perturbations in microphone positions, as well as its ability to adapt to different acoustic conditions is investigated using experiments with simulated and real data.Comment: Published in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 201

    Spatial Sound Rendering – A Survey

    Get PDF
    Simulating propagation of sound and audio rendering can improve the sense of realism and the immersion both in complex acoustic environments and dynamic virtual scenes. In studies of sound auralization, the focus has always been on room acoustics modeling, but most of the same methods are also applicable in the construction of virtual environments such as those developed to facilitate computer gaming, cognitive research, and simulated training scenarios. This paper is a review of state-of-the-art techniques that are based on acoustic principles that apply not only to real rooms but also in 3D virtual environments. The paper also highlights the need to expand the field of immersive sound in a web based browsing environment, because, despite the interest and many benefits, few developments seem to have taken place within this context. Moreover, the paper includes a list of the most effective algorithms used for modelling spatial sound propagation and reports their advantages and disadvantages. Finally, the paper emphasizes in the evaluation of these proposed works

    Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

    Full text link
    We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets. Our approach is designed to recover the reverb-free signal from a reverberant speech signal. We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation. We use the GWA dataset that consists of synthetic RIRs generated in a hybrid fashion: an accurate wave-based solver is used to simulate the lower frequencies and geometric ray tracing methods simulate the higher frequencies. We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods on four real-world RIR datasets.Comment: Submitted to ICASSP 202

    Analysis, modeling and wide-area spatiotemporal control of low-frequency sound reproduction

    Get PDF
    This research aims to develop a low-frequency response control methodology capable of delivering a consistent spectral and temporal response over a wide listening area. Low-frequency room acoustics are naturally plagued by room-modes, a result of standing waves at frequencies with wavelengths that are integer multiples of one or more room dimension. The standing wave pattern is different for each modal frequency, causing a complicated sound field exhibiting a highly position-dependent frequency response. Enhanced systems are investigated with multiple degrees of freedom (independently-controllable sound radiating sources) to provide adequate low-frequency response control. The proposed solution, termed a chameleon subwoofer array or CSA, adopts the most advantageous aspects of existing room-mode correction methodologies while emphasizing efficiency and practicality. Multiple degrees of freedom are ideally achieved by employing what is designated a hybrid subwoofer, which provides four orthogonal degrees of freedom configured within a modest-sized enclosure. The CSA software algorithm integrates both objective and subjective measures to address listener preferences including the possibility of individual real-time control. CSAs and existing techniques are evaluated within a novel acoustical modeling system (FDTD simulation toolbox) developed to meet the requirements of this research. Extensive virtual development of CSAs has led to experimentation using a prototype hybrid subwoofer. The resulting performance is in line with the simulations, whereby variance across a wide listening area is reduced by over 50% with only four degrees of freedom. A supplemental novel correction algorithm addresses correction issues at select narrow frequency bands. These frequencies are filtered from the signal and replaced using virtual bass to maintain all aural information, a psychoacoustical effect giving the impression of low-frequency. Virtual bass is synthesized using an original hybrid approach combining two mainstream synthesis procedures while suppressing each method‟s inherent weaknesses. This algorithm is demonstrated to improve CSA output efficiency while maintaining acceptable subjective performance

    A round robin on room acoustical simulation and auralization

    Get PDF
    A round robin was conducted to evaluate the state of the art of room acoustic modeling software both in the physical and perceptual realms. The test was based on six acoustic scenes highlighting specific acoustic phenomena and for three complex, “real-world” spatial environments. The results demonstrate that most present simulation algorithms generate obvious model errors once the assumptions of geometrical acoustics are no longer met. As a consequence, they are neither able to provide a reliable pattern of early reflections nor do they provide a reliable prediction of room acoustic parameters outside a medium frequency range. In the perceptual domain, the algorithms under test could generate mostly plausible but not authentic auralizations, i.e., the difference between simulated and measured impulse responses of the same scene was always clearly audible. Most relevant for this perceptual difference are deviations in tone color and source position between measurement and simulation, which to a large extent can be traced back to the simplified use of random incidence absorption and scattering coefficients and shortcomings in the simulation of early reflections due to the missing or insufficient modeling of diffraction.DFG, 174776315, FOR 1557: Simulation and Evaluation of Acoustical Environments (SEACEN
    corecore