281 research outputs found
Learning Neural Acoustic Fields
Our environment is filled with rich and dynamic acoustic information. When we
walk into a cathedral, the reverberations as much as appearance inform us of
the sanctuary's wide open space. Similarly, as an object moves around us, we
expect the sound emitted to also exhibit this movement. While recent advances
in learned implicit functions have led to increasingly higher quality
representations of the visual world, there have not been commensurate advances
in learning spatial auditory representations. To address this gap, we introduce
Neural Acoustic Fields (NAFs), an implicit representation that captures how
sounds propagate in a physical scene. By modeling acoustic propagation in a
scene as a linear time-invariant system, NAFs learn to continuously map all
emitter and listener location pairs to a neural impulse response function that
can then be applied to arbitrary sounds. We demonstrate that the continuous
nature of NAFs enables us to render spatial acoustics for a listener at an
arbitrary location, and can predict sound propagation at novel locations. We
further show that the representation learned by NAFs can help improve visual
learning with sparse views. Finally, we show that a representation informative
of scene structure emerges during the learning of NAFs.Comment: Project page:
https://www.andrew.cmu.edu/user/afluo/Neural_Acoustic_Fields
Few-Shot Audio-Visual Learning of Environment Acoustics
Room impulse response (RIR) functions capture how the surrounding physical
environment transforms the sounds heard by a listener, with implications for
various applications in AR, VR, and robotics. Whereas traditional methods to
estimate RIRs assume dense geometry and/or sound measurements throughout the
environment, we explore how to infer RIRs based on a sparse set of images and
echoes observed in the space. Towards that goal, we introduce a
transformer-based method that uses self-attention to build a rich acoustic
context, then predicts RIRs of arbitrary query source-receiver locations
through cross-attention. Additionally, we design a novel training objective
that improves the match in the acoustic signature between the RIR predictions
and the targets. In experiments using a state-of-the-art audio-visual simulator
for 3D environments, we demonstrate that our method successfully generates
arbitrary RIRs, outperforming state-of-the-art methods and -- in a major
departure from traditional methods -- generalizing to novel environments in a
few-shot manner. Project: http://vision.cs.utexas.edu/projects/fs_rir.Comment: Accepted to NeurIPS 202
Computationally-efficient and perceptually-motivated rendering of diffuse reflections in room acoustics simulation
Geometrical acoustics is well suited for simulating room reverberation in
interactive real-time applications. While the image source model (ISM) is
exceptionally fast, the restriction to specular reflections impacts its
perceptual plausibility. To account for diffuse late reverberation, hybrid
approaches have been proposed, e.g., using a feedback delay network (FDN) in
combination with the ISM. Here, a computationally-efficient, digital-filter
approach is suggested to account for effects of non-specular reflections in the
ISM and to couple scattered sound into a diffuse reverberation model using a
spatially rendered FDN. Depending on the scattering coefficient of a room
boundary, energy of each image source is split into a specular and a scattered
part which is added to the diffuse sound field. Temporal effects as observed
for an infinite ideal diffuse (Lambertian) reflector are simulated using
cascaded all-pass filters. Effects of scattering and multiple (inter-)
reflections caused by larger geometric disturbances at walls and by objects in
the room are accounted for in a highly simplified manner. Using a single
parameter to quantify deviations from an empty shoebox room, each reflection is
temporally smeared using cascaded all-pass filters. The proposed method was
perceptually evaluated against dummy head recordings of real rooms.Comment: This work has been submitted to Forum Acusticum 2023 for publicatio
Recommended from our members
Efficient Acoustic Simulation for Immersive Media and Digital Fabrication
Sound is a crucial part of our life. Well-designed acoustic behaviors can lead to significant improvement in both physical and virtual interactions. In computer graphics, most existing methods focused primarily on improving the accuracy. It remained underexplored on how to develop efficient acoustic simulation algorithms for interactive practical applications.
The challenges arise from the dilemma between expensive accurate simulations and fast feedback demanded by intuitive user interaction: traditional physics-based acoustic simulations are computationally expensive; yet, for end users to benefit from the simulations, it is crucial to give prompt feedback during interactions.
In this thesis, I investigate how to develop efficient acoustic simulations for real-world applications such as immersive media and digital fabrication. To address the above-mentioned challenges, I leverage precomputation and optimization to significantly improve the speed while preserving the accuracy of complex acoustic phenomena. This work discusses three efforts along this research direction: First, to ease sound designer's workflow, we developed a fast keypoint-based precomputation algorithm to enable interactive acoustic transfer values in virtual sound simulations. Second, for realistic audio editing in 360° videos, we proposed an inverse material optimization based on fast sound simulation and a hybrid ambisonic audio synthesis that exploits the directional isotropy in spatial audios. Third, we devised a modular approach to efficiently simulate and optimize fabrication-ready acoustic filters, achieving orders of magnitudes speedup while maintaining the simulation accuracy. Through this series of projects, I demonstrate a wide range of applications made possible by efficient acoustic simulations
A Computer Vision Inspired Automatic Acoustic Material Tagging System for Virtual Environments
This paper presents the ongoing work on an approach to material information retrieval in virtual environments (VEs). Our approach uses convolutional neural networks to classify materials by performing semantic segmentation on images captured in the VE. Class maps obtained are then re-projected onto the environment. We use transfer learning and fine-tune a pretrained segmentation model on images captured in our VEs. The geometry and semantic information can then be used to create mappings between objects in the VE and acoustic absorption coefficients. This can then be input for physically-based audio renderers, allowing a significant reduction in manual material tagging
Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction
This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method
- …