438 research outputs found

    Contextual awareness, messaging and communication in nomadic audio environments

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (p. 119-122).Nitin Sawhney.M.S

    Novel-View Acoustic Synthesis

    Get PDF
    We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space by analyzing the input audio-visual cues. To benchmark this task, we collect two first-of-their-kind large-scale multi-view audio-visual datasets, one synthetic and one real. We show that our model successfully reasons about the spatial cues and synthesizes faithful audio on both datasets. To our knowledge, this work represents the very first formulation, dataset, and approach to solve the novel-view acoustic synthesis task, which has exciting potential applications ranging from AR/VR to art and design. Unlocked by this work, we believe that the future of novel-view synthesis is in multi-modal learning from videos.Comment: Project page: https://vision.cs.utexas.edu/projects/nva

    Scanning Spaces: Paradigms for Spatial Sonification and Synthesis

    Get PDF
    In 1962 Karlheinz Stockhausen’s “Concept of Unity in Electronic Music” introduced a connection between the parameters of intensity, duration, pitch, and timbre using an accelerating pulse train. In 1973 John Chowning discovered that complex audio spectra could be synthesized by increasing vibrato rates past 20Hz. In both cases the notion of acceleration to produce timbre was critical to discovery. Although both composers also utilized sound spatialization in their works, spatial parameters were not unified with their synthesis techniques. This dissertation examines software studies and multimedia works involving the use of spatial and visual data to produce complex sound spectra. The culmination of these experiments, Spatial Modulation Synthesis, is introduced as a novel, mathematical control paradigm for audio-visual synthesis, providing unified control of spatialization, timbre, and visual form using high-speed sound trajectories.The unique, visual sonification and spatialization rendering paradigms of this disser- tation necessitated the development of an original audio-sample-rate graphics rendering implementation, which, unlike typical multimedia frameworks, provides an exchange of audio-visual data without downsampling or interpolation

    Taux : a system for evaluating sound feedback in navigational tasks

    Get PDF
    This thesis presents the design and development of an evaluation system for generating audio displays that provide feedback to persons performing navigation tasks. It first develops the need for such a system by describing existing wayfinding solutions, investigating new electronic location-based methods that have the potential of changing these solutions and examining research conducted on relevant audio information representation techniques. An evaluation system that supports the manipulation of two basic classes of audio display is then described. Based on prior work on wayfinding with audio display, research questions are developed that investigate the viability of different audio displays. These are used to generate hypotheses and develop an experiment which evaluates four variations of audio display for wayfinding. Questions are also formulated that evaluate a baseline condition that utilizes visual feedback. An experiment which tests these hypotheses on sighted users is then described. Results from the experiment suggest that spatial audio combined with spoken hints is the best approach of the approaches comparing spatial audio. The test experiment results also suggest that muting a varying audio signal when a subject is on course did not improve performance. The system and method are then refined. A second experiment is conducted with improved displays and an improved experiment methodology. After adding blindfolds for sighted subjects and increasing the difficulty of navigation tasks by reducing the arrival radius, similar comparisons were observed. Overall, the two experiments demonstrate the viability of the prototyping tool for testing and refining multiple different audio display combinations for navigational tasks. The detailed contributions of this work and future research opportunities conclude this thesis

    Multimodality in VR: A survey

    Get PDF
    Virtual reality (VR) is rapidly growing, with the potential to change the way we create and consume content. In VR, users integrate multimodal sensory information they receive, to create a unified perception of the virtual world. In this survey, we review the body of work addressing multimodality in VR, and its role and benefits in user experience, together with different applications that leverage multimodality in many disciplines. These works thus encompass several fields of research, and demonstrate that multimodality plays a fundamental role in VR; enhancing the experience, improving overall performance, and yielding unprecedented abilities in skill and knowledge transfer

    Assessment of Audio Interfaces for use in Smartphone Based Spatial Learning Systems for the Blind

    Get PDF
    Recent advancements in the field of indoor positioning and mobile computing promise development of smart phone based indoor navigation systems. Currently, the preliminary implementations of such systems only use visual interfaces—meaning that they are inaccessible to blind and low vision users. According to the World Health Organization, about 39 million people in the world are blind. This necessitates the need for development and evaluation of non-visual interfaces for indoor navigation systems that support safe and efficient spatial learning and navigation behavior. This thesis research has empirically evaluated several different approaches through which spatial information about the environment can be conveyed through audio. In the first experiment, blindfolded participants standing at an origin in a lab learned the distance and azimuth of target objects that were specified by four audio modes. The first three modes were perceptual interfaces and did not require cognitive mediation on the part of the user. The fourth mode was a non-perceptual mode where object descriptions were given via spatial language using clockface angles. After learning the targets through the four modes, the participants spatially updated the position of the targets and localized them by walking to each of them from two indirect waypoints. The results also indicate hand motion triggered mode to be better than the head motion triggered mode and comparable to auditory snapshot. In the second experiment, blindfolded participants learned target object arrays with two spatial audio modes and a visual mode. In the first mode, head tracking was enabled, whereas in the second mode hand tracking was enabled. In the third mode, serving as a control, the participants were allowed to learn the targets visually. We again compared spatial updating performance with these modes and found no significant performance differences between modes. These results indicate that we can develop 3D audio interfaces on sensor rich off the shelf smartphone devices, without the need of expensive head tracking hardware. Finally, a third study, evaluated room layout learning performance by blindfolded participants with an android smartphone. Three perceptual and one non-perceptual mode were tested for cognitive map development. As expected the perceptual interfaces performed significantly better than the non-perceptual language based mode in an allocentric pointing judgment and in overall subjective rating. In sum, the perceptual interfaces led to better spatial learning performance and higher user ratings. Also there is no significant difference in a cognitive map developed through spatial audio based on tracking user’s head or hand. These results have important implications as they support development of accessible perceptually driven interfaces for smartphones

    Shaping the auditory peripersonal space with motor planning in immersive virtual reality

    Get PDF
    Immersive audio technologies require personalized binaural synthesis through headphones to provide perceptually plausible virtual and augmented reality (VR/AR) simulations. We introduce and apply for the first time in VR contexts the quantitative measure called premotor reaction time (pmRT) for characterizing sonic interactions between humans and the technology through motor planning. In the proposed basic virtual acoustic scenario, listeners are asked to react to a virtual sound approaching from different directions and stopping at different distances within their peripersonal space (PPS). PPS is highly sensitive to embodied and environmentally situated interactions, anticipating the motor system activation for a prompt preparation for action. Since immersive VR applications benefit from spatial interactions, modeling the PPS around the listeners is crucial to reveal individual behaviors and performances. Our methodology centered around the pmRT is able to provide a compact description and approximation of the spatiotemporal PPS processing and boundaries around the head by replicating several well-known neurophysiological phenomena related to PPS, such as auditory asymmetry, front/back calibration and confusion, and ellipsoidal action fields
    • …
    corecore