13 research outputs found

    Sound Synthesis and Evaluation of Interactive Footsteps for Virtual Reality Applications

    Get PDF

    SynthÚse de son de papier adaptée au mouvement et à la géométrie de la surface

    Get PDF
    National audienceNous présentons une méthode pour générer en temps réel un son plausible pour une animation d'un papier virtuel que l'on froisse. Pour cela, nous analysons l'animation géométrique de la surface du papier pour détecter les événements à l'origine de sons puis calculons géométriquement les zones du papier qui vibrent de part la propagation des ondes au travers de la surface. Le son résultant est ensuite synthétisé à partir à la fois d'extraits pré-enregistrés, et d'une synthÚse procédurale, tenant compte de la forme géométrique de la surface et de sa dynamique. Nous validons nos résultats en comparant le son généré par notre modÚle virtuel par rapport à des enregistrements réels pour un ensemble de cas d'animations caractéristiques. Abstract In this article, we present a method to generate plausible sounds for an animation of crumpling paper in real-time. We analyse the geometrical animation of the deformed surface to detect sound-producing events and compute the regions which resonate due to the propagation of the vibrations though the paper. The resulting sound is synthesized from both pre-recorded sounds and procedural generation taking into account the geometry of the surface and its dynamic. Our results are validated by comparing the generated sound of the virtual model with respect to real recording for a set of specific deformations

    Real-Time Physically Based Sound Synthesis and Application in Multimodal Interaction

    Get PDF
    An immersive experience in virtual environments requires realistic auditory feedback that is closely coupled with other modalities, such as vision and touch. This is particularly challenging for real-time applications due to its stringent computational requirement. In this dissertation, I present and evaluate effective real-time physically based sound synthesis models that integrate visual and touch data and apply them to create richly varying multimodal interaction. I first propose an efficient contact sound synthesis technique that accounts for texture information used for visual rendering and greatly reinforces cross-modal perception. Secondly, I present both empirical and psychoacoustic approaches that formally study the geometry-invariant property of the commonly used material model in real-time sound synthesis. Based on this property, I design a novel example-based material parameter estimation framework that automatically creates synthetic sound effects naturally controlled by complex geometry and dynamics in visual simulation. Lastly, I translate user touch input captured on commodity multi-touch devices to physical performance models that drive both visual and auditory rendering. This novel multimodal interaction is demonstrated in a virtual musical instrument application on both a large-size tabletop and mobile tablet devices, and evaluated through pilot studies. Such an application offers capabilities for intuitive and expressive music playing, rapid prototyping of virtual instruments, and active exploration of sound effects determined by various physical parameters.Doctor of Philosoph

    Perceptually Driven Interactive Sound Propagation for Virtual Environments

    Get PDF
    Sound simulation and rendering can significantly augment a user‘s sense of presence in virtual environments. Many techniques for sound propagation have been proposed that predict the behavior of sound as it interacts with the environment and is received by the user. At a broad level, the propagation algorithms can be classified into reverberation filters, geometric methods, and wave-based methods. In practice, heuristic methods based on reverberation filters are simple to implement and have a low computational overhead, while wave-based algorithms are limited to static scenes and involve extensive precomputation. However, relatively little work has been done on the psychoacoustic characterization of different propagation algorithms, and evaluating the relationship between scientific accuracy and perceptual benefits.In this dissertation, we present perceptual evaluations of sound propagation methods and their ability to model complex acoustic effects for virtual environments. Our results indicate that scientifically accurate methods for reverberation and diffraction do result in increased perceptual differentiation. Based on these evaluations, we present two novel hybrid sound propagation methods that combine the accuracy of wave-based methods with the speed of geometric methods for interactive sound propagation in dynamic scenes.Our first algorithm couples modal sound synthesis with geometric sound propagation using wave-based sound radiation to perform mode-aware sound propagation. We introduce diffraction kernels of rigid objects,which encapsulate the sound diffraction behaviors of individual objects in the free space and are then used to simulate plausible diffraction effects using an interactive path tracing algorithm. Finally, we present a novel perceptual driven metric that can be used to accelerate the computation of late reverberation to enable plausible simulation of reverberation with a low runtime overhead. We highlight the benefits of our novel propagation algorithms in different scenarios.Doctor of Philosoph

    Whole brain emulation: a roadmap

    Get PDF

    Interactive Sound Propagation using Precomputation and Statistical Approximations

    Get PDF
    Acoustic phenomena such as early reflections, diffraction, and reverberation have been shown to improve the user experience in interactive virtual environments and video games. These effects arise due to repeated interactions between sound waves and objects in the environment. In interactive applications, these effects must be simulated within a prescribed time budget. We present two complementary approaches for computing such acoustic effects in real time, with plausible variation in the sound field throughout the scene. The first approach, Precomputed Acoustic Radiance Transfer, precomputes a matrix that accounts for multiple acoustic interactions between all scene objects. The matrix is used at run time to provide sound propagation effects that vary smoothly as sources and listeners move. The second approach couples two techniques -- Ambient Reverberance, and Aural Proxies -- to provide approximate sound propagation effects in real time, based on only the portion of the environment immediately visible to the listener. These approaches lie at different ends of a space of interactive sound propagation techniques for modeling sound propagation effects in interactive applications. The first approach emphasizes accuracy by modeling acoustic interactions between all parts of the scene; the second approach emphasizes efficiency by only taking the local environment of the listener into account. These methods have been used to efficiently generate acoustic walkthroughs of architectural models. They have also been integrated into a modern game engine, and can enable realistic, interactive sound propagation on commodity desktop PCs.Doctor of Philosoph

    Interactive Sound Propagation for Massive Multi-user and Dynamic Virtual Environments

    Get PDF
    Hearing is an important sense and it is known that rendering sound effects can enhance the level of immersion in virtual environments. Modeling sound waves is a complex problem, requiring vast computing resources to solve accurately. Prior methods are restricted to static scenes or limited acoustic effects. In this thesis, we present methods to improve the quality and performance of interactive geometric sound propagation in dynamic scenes and precomputation algorithms for acoustic propagation in enormous multi-user virtual environments. We present a method for finding edge diffraction propagation paths on arbitrary 3D scenes for dynamic sources and receivers. Using this algorithm, we present a unified framework for interactive simulation of specular reflections, diffuse reflections, diffraction scattering, and reverberation effects. We also define a guidance algorithm for ray tracing that responds to dynamic environments and reorders queries to minimize simulation time. Our approach works well on modern GPUs and can achieve more than an order of magnitude performance improvement over prior methods. Modern multi-user virtual environments support many types of client devices, and current phones and mobile devices may lack the resources to run acoustic simulations. To provide such devices the benefits of sound simulation, we have developed a precomputation algorithm that efficiently computes and stores acoustic data on a server in the cloud. Using novel algorithms, the server can render enhanced spatial audio in scenes spanning several square kilometers for hundreds of clients in realtime. Our method provides the benefits of immersive audio to collaborative telephony, video games, and multi-user virtual environments.Doctor of Philosoph

    MULTIMODAL LEARNING FOR AUDIO AND VISUAL PROCESSING

    Get PDF
    The world contains vast amounts of information which can be sensed and captured in a variety of ways and formats. Virtual environments also lend themselves to endless possibilities and diversity of data. Often our experiences draw from these separate but complementary parts which can be combined in a way to provide a comprehensive representation of the events. Multimodal learning focuses on these types of combinations. By fusing multiple modalities, multimodal learning can improve results beyond individual mode performance. However, many of today’s state-of-the-art techniques in computer vision, robotics, and machine learning rely solely or primarily on visual inputs even when the visual data is obtained from video where corresponding audio may also be readily available to augment learning. Vision only approaches can experience challenges in cases of highly reflective, transparent, or occluded objects and scenes where, if used alone or in conjunction with, audio may improve task performance. To address these challenges, this thesis explores coupling multimodal information to enhance task performance through learning-based methods for audio and visual processing using real and synthetic data. Physically-based graphics pipelines can naturally be extended for audio and visual synthetic data generation. To enhance the rigid body sound synthesis pipeline for objects containing a liquid, I used an added mass operator for fluid-structure coupling as a pre-processing step. My method is fast and practical for use in interactive 3D systems where live sound synthesis is desired. By fusing audio and visual data from real and synthetic videos, we also demonstrate enhanced processing and performance for object classification, tracking, and reconstruction tasks. As has been shown in visual question and answering and other related work, multiple modalities have the ability to complement one another and outperform single modality systems. To the best of my knowledge, I introduced the first use of audio-visual neural networks to analyze liquid pouring sequences by classifying their weight, liquid, and receiving container. Prior work often required predefined source weights or visual data. My contribution was to use the sound from a pouring sequence—a liquid being poured into a target container- to train a multimodal convolutional neural networks (CNNs) that fuses mel-scaled spectrograms as audio inputs with corresponding visual data based on video images. I described the first use of an audio-visual neural network for tracking tabletop sized objects and enhancing visual object trackers. Like object detection of reflective surfaces, object trackers can also run into challenges when objects collide, occlude, appear similar, or come close to one another. By using the impact sounds of the objects during collision, my audio-visual object tracking (AVOT) neural network can correct trackers that drift from their original objects that were assigned before collision. Reflective and textureless surfaces not only are difficult to detect and classify, they are also often poorly reconstructed and filled with depth discontinuities and holes. I proposed the first use of an audiovisual method that uses the reflections of sound to aid in geometry and audio reconstruction, referred to as ”Echoreconstruction”. The mobile phone prototype emits pulsed audio, while recording video for RGBbased 3D reconstruction and audio-visual classification. Reflected sound and images from the video are input into our audio (EchoCNN-A) and audio-visual (EchoCNN-AV) convolutional neural networks for surface and sound source detection, depth estimation, and material classification. EchoCNN inferences from these classifications enhance scene 3D reconstructions containing open spaces and reflective surfaces by depth filtering, inpainting, and placement of unmixed sound sources in the scene. In addition to enhancing scene reconstructions, I proposed a multimodal single- and multi-frame reconstruction LSTM autoencoder for 3D reconstructions using audio-visual inputs. Our neural network produces high-quality 3D reconstructions using voxel representation. It is the first audio-visual reconstruction neural network for 3D geometry and material representation. Contributions of this thesis include new neural network designs, new enhancements to real and synthetic audio-visual datasets, and prototypes that demonstrate audio and audio-augmented performance for sound synthesis, inference, and reconstruction.Doctor of Philosoph

    Audio-Material Modeling and Reconstruction for Multimodal Interaction

    Get PDF
    Interactive virtual environments enable the creation of training simulations, games, and social applications. These virtual environments can create a sense of presence in the environment: a sensation that its user is truly in another location. To maintain presence, interactions with virtual objects should engage multiple senses. Furthermore, multisensory input should be consistent, e.g. a virtual bowl that visually appears plastic should also sound like plastic when dropped on the floor. In this dissertation, I propose methods to improve the perceptual realism of virtual object impact sounds and ensure consistency between those sounds and the input from other senses. Recreating the impact sound of a real-world object requires an accurate estimate of that object's material parameters. The material parameters that affect impact sound---collectively forming the audio-material---include the material damping parameters for a damping model. I propose and evaluate damping models and use them to estimate material damping parameters for real-world objects. I also consider how interaction with virtual objects can be made more consistent between the senses of sight, hearing, and touch. First, I present a method for modeling the damping behavior of impact sounds, using generalized proportional damping to both estimate more expressive material damping parameters from recorded impact sounds and perform impact sound synthesis. Next, I present a method for estimating material damping parameters in the presence of confounding factors and with no knowledge of the object's shape. To accomplish this, a probabilistic damping model captures various external effects to produce robust damping parameter estimates. Next, I present a method for consistent multimodal interaction with textured surfaces. Texture maps serve as a single unified representation of mesoscopic detail for the purposes of visual rendering, sound synthesis, and rigid-body simulation. Finally, I present a method for geometry and material classification using multimodal audio-visual input. Using this method, a real-world scene can be scanned and virtually reconstructed while accurately modeling both the visual appearances and audio-material parameters of each object.Doctor of Philosoph
    corecore