11 research outputs found

    Real-time 3-D Mapping with Estimating Acoustic Materials

    Full text link
    This paper proposes a real-time system integrating an acoustic material estimation from visual appearance and an on-the-fly mapping in the 3-dimension. The proposed method estimates the acoustic materials of surroundings in indoor scenes and incorporates them to a 3-D occupancy map, as a robot moves around the environment. To estimate the acoustic material from the visual cue, we apply the state-of-the-art semantic segmentation CNN network based on the assumption that the visual appearance and the acoustic materials have a strong association. Furthermore, we introduce an update policy to handle the material estimations during the online mapping process. As a result, our environment map with acoustic material can be used for sound-related robotics applications, such as sound source localization taking into account various acoustic propagation (e.g., reflection)

    Reflection-Aware Sound Source Localization

    Full text link
    We present a novel, reflection-aware method for 3D sound localization in indoor environments. Unlike prior approaches, which are mainly based on continuous sound signals from a stationary source, our formulation is designed to localize the position instantaneously from signals within a single frame. We consider direct sound and indirect sound signals that reach the microphones after reflecting off surfaces such as ceilings or walls. We then generate and trace direct and reflected acoustic paths using inverse acoustic ray tracing and utilize these paths with Monte Carlo localization to estimate a 3D sound source position. We have implemented our method on a robot with a cube-shaped microphone array and tested it against different settings with continuous and intermittent sound signals with a stationary or a mobile source. Across different settings, our approach can localize the sound with an average distance error of 0.8m tested in a room of 7m by 7m area with 3m height, including a mobile and non-line-of-sight sound source. We also reveal that the modeling of indirect rays increases the localization accuracy by 40% compared to only using direct acoustic rays.Comment: Submitted to ICRA 2018. The working video is available at (https://youtu.be/TkQ36lMEC-M

    Shaping the auditory peripersonal space with motor planning in immersive virtual reality

    Get PDF
    Immersive audio technologies require personalized binaural synthesis through headphones to provide perceptually plausible virtual and augmented reality (VR/AR) simulations. We introduce and apply for the first time in VR contexts the quantitative measure called premotor reaction time (pmRT) for characterizing sonic interactions between humans and the technology through motor planning. In the proposed basic virtual acoustic scenario, listeners are asked to react to a virtual sound approaching from different directions and stopping at different distances within their peripersonal space (PPS). PPS is highly sensitive to embodied and environmentally situated interactions, anticipating the motor system activation for a prompt preparation for action. Since immersive VR applications benefit from spatial interactions, modeling the PPS around the listeners is crucial to reveal individual behaviors and performances. Our methodology centered around the pmRT is able to provide a compact description and approximation of the spatiotemporal PPS processing and boundaries around the head by replicating several well-known neurophysiological phenomena related to PPS, such as auditory asymmetry, front/back calibration and confusion, and ellipsoidal action fields

    Efficient Wave-based Sound Propagation and Optimization for Computer-Aided Design

    Get PDF
    Acoustic phenomena have a large impact on our everyday lives, from influencing our enjoyment of music in a concert hall, to affecting our concentration at school or work, to potentially negatively impacting our health through deafening noises. The sound that reaches our ears is absorbed, reflected, and filtered by the shape, topology, and materials present in the environment. However, many computer simulation techniques for solving these sound propagation problems are either computationally expensive or inaccurate. Additionally, the costs of some methods are dramatically increased in design optimization processes in which several iterations of sound propagation evaluation are necessary. The primary goal of this dissertation is to present techniques for efficiently solving the sound propagation problem and related optimization problems for computer-aided design. First, we propose a parallel method for solving large acoustic propagation problems, scalable to tens of thousands of cores. Second, we present two novel techniques for optimizing certain acoustic characteristics such as reverberation time or sound clarity using wave-based sound propagation. Finally, we show how hybrid sound propagation algorithms can be used to improve the performance of acoustic optimization problems and present two algorithms for noise minimization and speech intelligibility improvement that use this hybrid approach. All the algorithms we present are evaluated on various benchmarks that are computer models of architectural scenes. These benchmarks include challenging environments for existing sound propagation algorithms, such as large indoor or outdoor scenes, structural complex scenes, or the prevalence of difficult-to-model sound propagation phenomena. Using the techniques put forth in this dissertation, we can solve many challenging sound propagation and optimization problems on the scenes in an efficient manner. We are able to accurately model sound propagation using wave-based approaches up to \SI{10}{\kilo\hertz} (the full range of human speech) and for the full range of human hearing (22kHz) using our hybrid approach. Our noise minimization methods show improvements of up to 13dB in noise reduction on some scenes, and we show a 71\% improvement in speech intelligibility using our algorithm.Doctor of Philosoph

    MULTIMODAL LEARNING FOR AUDIO AND VISUAL PROCESSING

    Get PDF
    The world contains vast amounts of information which can be sensed and captured in a variety of ways and formats. Virtual environments also lend themselves to endless possibilities and diversity of data. Often our experiences draw from these separate but complementary parts which can be combined in a way to provide a comprehensive representation of the events. Multimodal learning focuses on these types of combinations. By fusing multiple modalities, multimodal learning can improve results beyond individual mode performance. However, many of today’s state-of-the-art techniques in computer vision, robotics, and machine learning rely solely or primarily on visual inputs even when the visual data is obtained from video where corresponding audio may also be readily available to augment learning. Vision only approaches can experience challenges in cases of highly reflective, transparent, or occluded objects and scenes where, if used alone or in conjunction with, audio may improve task performance. To address these challenges, this thesis explores coupling multimodal information to enhance task performance through learning-based methods for audio and visual processing using real and synthetic data. Physically-based graphics pipelines can naturally be extended for audio and visual synthetic data generation. To enhance the rigid body sound synthesis pipeline for objects containing a liquid, I used an added mass operator for fluid-structure coupling as a pre-processing step. My method is fast and practical for use in interactive 3D systems where live sound synthesis is desired. By fusing audio and visual data from real and synthetic videos, we also demonstrate enhanced processing and performance for object classification, tracking, and reconstruction tasks. As has been shown in visual question and answering and other related work, multiple modalities have the ability to complement one another and outperform single modality systems. To the best of my knowledge, I introduced the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container. Prior work often required predefined source weights or visual data. My contribution was to use the sound from a pouring sequence—a liquid being poured into a target container- to train a multimodal convolutional neural networks (CNNs) that fuses mel-scaled spectrograms as audio inputs with corresponding visual data based on video images. I described the first use of an audio-visual neural network for tracking tabletop sized objects and enhancing visual object trackers. Like object detection of reflective surfaces, object trackers can also run into challenges when objects collide, occlude, appear similar, or come close to one another. By using the impact sounds of the objects during collision, my audio-visual object tracking (AVOT) neural network can correct trackers that drift from their original objects that were assigned before collision. Reflective and textureless surfaces not only are difficult to detect and classify, they are also often poorly reconstructed and filled with depth discontinuities and holes. I proposed the first use of an audiovisual method that uses the reflections of sound to aid in geometry and audio reconstruction, referred to as ”Echoreconstruction”. The mobile phone prototype emits pulsed audio, while recording video for RGBbased 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. EchoCNN inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. In addition to enhancing scene reconstructions, I proposed a multimodal single- and multi-frame reconstruction LSTM autoencoder for 3D reconstructions using audio-visual inputs. Our neural network produces high-quality 3D reconstructions using voxel representation. It is the first audio-visual reconstruction neural network for 3D geometry and material representation. Contributions of this thesis include new neural network designs, new enhancements to real and synthetic audio-visual datasets, and prototypes that demonstrate audio and audio-augmented performance for sound synthesis, inference, and reconstruction.Doctor of Philosoph

    Efficient Interactive Sound Propagation in Dynamic Environments

    Get PDF
    The physical phenomenon of sound is ubiquitous in our everyday life and is an important component of immersion in interactive virtual reality applications. Sound propagation involves modeling how sound is emitted from a source, interacts with the environment, and is received by a listener. Previous techniques for computing interactive sound propagation in dynamic scenes are based on geometric algorithms such as ray tracing. However, the performance and quality of these algorithms is strongly dependent on the number of rays traced. In addition, it is difficult to acquire acoustic material properties. It is also challenging to efficiently compute spatial sound effects from the output of ray tracing-based sound propagation. These problems lead to increased latency and less plausible sound in dynamic interactive environments. In this dissertation, we propose three approaches with the goal of addressing these challenges. First, we present an approach that utilizes temporal coherence in the sound field to reuse computation from previous simulation time steps. Secondly, we present a framework for the automatic acquisition of acoustic material properties using visual and audio measurements of real-world environments. Finally, we propose efficient techniques for computing directional spatial sound for sound propagation with low latency using head-related transfer functions (HRTF). We have evaluated both the performance and subjective impact of these techniques on a variety of complex dynamic indoor and outdoor environments and observe an order-of-magnitude speedup over previous approaches. The accuracy of our approaches has been validated against real-world measurements and previous methods. The proposed techniques enable interactive simulation of sound propagation in complex multi-source dynamic environments.Doctor of Philosoph
    corecore