19 research outputs found

    What do your footsteps sound like? An investigation on interactive footstep sounds adjustment

    Get PDF
    This paper presents an experiment where participants were asked to adjust, while walking, the spectral content and the amplitude of synthetic footstep sounds in order to match the sounds of their own footsteps. The sounds were interactively generated by means of a shoe-based system capable of tracking footfalls and delivering real-time auditory feedback via headphones. Results allowed identification of the mean value and the range of variation of spectral centroid and peak level of footstep sounds simulating various combinations of shoe type and ground material. Results showed that the effect of ground material on centroid and peak level depended on the type of shoe. Similarly, the effect of shoe type on the two variables depended on the type of ground material. In particular, participants produced greater amplitudes for hard sole shoes than for soft sole shoes in presence of solid surfaces, while similar amplitudes for both types of shoes were found for aggregate, hybrids, and liquids. No significant correlations were found between each of the two acoustic features and participants’ body size. This result might be explained by the fact that while adjusting the sounds participants did not primarily focus on the acoustic rendering of their body. In addition, no significant differences were found between the values of the two acoustic features selected by the experimenters and those adjusted by participants. This result can therefore be considered as a measure of the goodness of the design choices to synthesize the involved footstep sounds for a generic walker. More importantly, this study showed that the relationships between the ground-shoes combinations are not changed when participants are actively walking. This represents the first active listening confirmation of this result, which had previously only been shown in passive listening studies. The results of this research can be used to design ecologically-valid auditory rendering of foot-floor interactions in virtual environments.This work was supported partly by a grant from the Danish Council for Independent Research awarded to Luca Turchet (Grant No. 12-131985), and partly by a grant from the ESRC awarded to Ana Tajadura-JimĂ©nez (Grant No. ES/K001477/1)

    Mixed Structural Models for 3D Audio in Virtual Environments

    Get PDF
    In the world of ICT, strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of the new technology by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but a few. The concurrent presence of multimodal senses and activities make multimodal virtual environments potentially flexible and adaptive, allowing users to switch between modalities as needed during the continuously changing conditions of use situation. Augmentation through additional modalities and sensory substitution techniques are compelling ingredients for presenting information non-visually, when the visual bandwidth is overloaded, when data are visually occluded, or when the visual channel is not available to the user (e.g., for visually impaired people). Multimodal systems for the representation of spatial information will largely benefit from the implementation of audio engines that have extensive knowledge of spatial hearing and virtual acoustics. Models for spatial audio can provide accurate dynamic information about the relation between the sound source and the surrounding environment, including the listener and his/her body which acts as an additional filter. Indeed, this information cannot be substituted by any other modality (i.e., visual or tactile). Nevertheless, today's spatial representation of audio within sonification tends to be simplistic and with poor interaction capabilities, being multimedia systems currently focused on graphics processing mostly, and integrated with simple stereo or multi-channel surround-sound. On a much different level lie binaural rendering approaches based on headphone reproduction, taking into account that possible disadvantages (e.g. invasiveness, non-flat frequency responses) are counterbalanced by a number of desirable features. Indeed, these systems might control and/or eliminate reverberation and other acoustic effects of the real listening space, reduce background noise, and provide adaptable and portable audio displays, which are all relevant aspects especially in enhanced contexts. Most of the binaural sound rendering techniques currently exploited in research rely on the use of Head-Related Transfer Functions (HRTFs), i.e. peculiar filters that capture the acoustic effects of the human head and ears. HRTFs allow loyal simulation of the audio signal that arrives at the entrance of the ear canal as a function of the sound source's spatial position. HRTF filters are usually presented under the form of acoustic signals acquired on dummy heads built according to mean anthropometric measurements. Nevertheless, anthropometric features of the human body have a key role in HRTF shaping: several studies have attested how listening to non-individual binaural sounds results in evident localization errors. On the other hand, individual HRTF measurements on a significant number of subjects result both time- and resource-expensive. Several techniques for synthetic HRTF design have been proposed during the last two decades and the most promising one relies on structural HRTF models. In this revolutionary approach, the most important effects involved in spatial sound perception (acoustic delays and shadowing due to head diffraction, reflections on pinna contours and shoulders, resonances inside the ear cavities) are isolated and modeled separately with a corresponding filtering element. HRTF selection and modeling procedures can be determined by physical interpretation: parameters of each rendering blocks or selection criteria can be estimated from real and simulated data and related to anthropometric geometries. Effective personal auditory displays represent an innovative breakthrough for a plethora of applications and structural approach can also allow for effective scalability depending on the available computational resources or bandwidth. Scenes with multiple highly realistic audiovisual objects are easily managed exploiting parallelism of increasingly ubiquitous GPUs (Graphics Processing Units). Building individual headphone equalization with perceptually robust inverse filtering techniques represents a fundamental step towards the creation of personal virtual auditory displays (VADs). To this regard, several examples might benefit from these considerations: multi-channel downmix over headphones, personal cinema, spatial audio rendering in mobile devices, computer-game engines and individual binaural audio standards for movie and music production. This thesis presents a family of approaches that overcome the current limitations of headphone-based 3D audio systems, aiming at building personal auditory displays through structural binaural audio models for an immersive sound reproduction. The resulting models allow for an interesting form of content adaptation and personalization, since they include parameters related to the user's anthropometry in addition to those related to the sound sources and the environment. The covered research directions converge to a novel framework for synthetic HRTF design and customization that combines the structural modeling paradigm with other HRTF selection techniques (inspired by non-individualized HRTF selection procedures) and represents the main novel contribution of this thesis: the Mixed Structural Modeling (MSM) approach considers the global HRTF as a combination of structural components, which can be chosen to be either synthetic or recorded components. In both cases, customization is based on individual anthropometric data, which are used to either fit the model parameters or to select a measured/simulated component within a set of available responses. The definition and experimental validation of the MSM approach addresses several pivotal issues towards the acquisition and delivery of binaural sound scenes and designing guidelines for personalized 3D audio virtual environments holding the potential of novel forms of customized communication and interaction with sound and music content. The thesis also presents a multimodal interactive system which is used to conduct subjective test on multi-sensory integration in virtual environments. Four experimental scenarios are proposed in order to test the capabilities of auditory feedback jointly to tactile or visual modalities. 3D audio feedback related to user’s movements during simple target following tasks is tested as an applicative example of audio-visual rehabilitation system. Perception of direction of footstep sounds interactively generated during walking and provided through headphones highlights how spatial information can clarify the semantic congruence between movement and multimodal feedback. A real time, physically informed audio-tactile interactive system encodes spatial information in the context of virtual map presentation with particular attention to orientation and mobility (O&M) learning processes addressed to visually impaired people. Finally, an experiment analyzes the haptic estimation of size of a virtual 3D object (a stair-step) whereas the exploration is accompanied by a real-time generated auditory feedback whose parameters vary as a function of the height of the interaction point. The collected data from these experiments suggest that well-designed multimodal feedback, exploiting 3D audio models, can definitely be used to improve performance in virtual reality and learning processes in orientation and complex motor tasks, thanks to the high level of attention, engagement, and presence provided to the user. The research framework, based on the MSM approach, serves as an important evaluation tool with the aim of progressively determining the relevant spatial attributes of sound for each application domain. In this perspective, such studies represent a novelty in the current literature on virtual and augmented reality, especially concerning the use of sonification techniques in several aspects of spatial cognition and internal multisensory representation of the body. This thesis is organized as follows. An overview of spatial hearing and binaural technology through headphones is given in Chapter 1. Chapter 2 is devoted to the Mixed Structural Modeling formalism and philosophy. In Chapter 3, topics in structural modeling for each body component are studied, previous research and two new models, i.e. near-field distance dependency and external-ear spectral cue, are presented. Chapter 4 deals with a complete case study of the mixed structural modeling approach and provides insights about the main innovative aspects of such modus operandi. Chapter 5 gives an overview of number of a number of proposed tools for the analysis and synthesis of HRTFs. System architectural guidelines and constraints are discussed in terms of real-time issues, mobility requirements and customized audio delivery. In Chapter 6, two case studies investigate the behavioral importance of spatial attribute of sound and how continuous interaction with virtual environments can benefit from using spatial audio algorithms. Chapter 7 describes a set of experiments aimed at assessing the contribution of binaural audio through headphones in learning processes of spatial cognitive maps and exploration of virtual objects. Finally, conclusions are drawn and new research horizons for further work are exposed in Chapter 8

    Presence studies as an evaluation method for user experiences in multimodal virtual environments

    Get PDF

    Perceptually Driven Interactive Sound Propagation for Virtual Environments

    Get PDF
    Sound simulation and rendering can significantly augment a user‘s sense of presence in virtual environments. Many techniques for sound propagation have been proposed that predict the behavior of sound as it interacts with the environment and is received by the user. At a broad level, the propagation algorithms can be classified into reverberation filters, geometric methods, and wave-based methods. In practice, heuristic methods based on reverberation filters are simple to implement and have a low computational overhead, while wave-based algorithms are limited to static scenes and involve extensive precomputation. However, relatively little work has been done on the psychoacoustic characterization of different propagation algorithms, and evaluating the relationship between scientific accuracy and perceptual benefits.In this dissertation, we present perceptual evaluations of sound propagation methods and their ability to model complex acoustic effects for virtual environments. Our results indicate that scientifically accurate methods for reverberation and diffraction do result in increased perceptual differentiation. Based on these evaluations, we present two novel hybrid sound propagation methods that combine the accuracy of wave-based methods with the speed of geometric methods for interactive sound propagation in dynamic scenes.Our first algorithm couples modal sound synthesis with geometric sound propagation using wave-based sound radiation to perform mode-aware sound propagation. We introduce diffraction kernels of rigid objects,which encapsulate the sound diffraction behaviors of individual objects in the free space and are then used to simulate plausible diffraction effects using an interactive path tracing algorithm. Finally, we present a novel perceptual driven metric that can be used to accelerate the computation of late reverberation to enable plausible simulation of reverberation with a low runtime overhead. We highlight the benefits of our novel propagation algorithms in different scenarios.Doctor of Philosoph

    Video Game Acoustics: Perception-Based Sound Design for Interactive Virtual Spaces Submitted

    Get PDF
    Video game acoustics are the various aspects of sound physics that can be represented in a video game, as well as the perception and interpretation of those sound physics by a player. At its core, the research here aims to identify the many functions and considerations of acoustics in interactive virtual spaces, while also building a theoretical foundation for video game acoustics by gathering relevant research from a wide variety of disciplines into a single video game context. The writing here also functions as an informative resource for video game sound designers and is primarily written for that audience. Through a review of the literature it is found that there is research available across many different disciplines that is relevant to video game acoustics, but none that bring it all together and fully explore acoustics in a video game context. Small discussions related to the topic occur sporadically throughout various fields, however there are few of any detailed focus and even fewer with video game sound designers as their intended audience. This scattering and dilution of relevant information validates the need for its distillation into a dedicated discussion. The writing here addresses this gap in the literature and in doing so uncovers aspects of video game acoustics that have not previously been given adequate attention. This thesis accomplishes its aims by combining an interdisciplinary background with an emphasis on simplification to suit the creative field of game sound design. A theoretical foundation is built from several different disciplines, including Acoustics, auditory perception, acoustic simulation, sound theory, spatial presence, film sound, and of course game sound. A twofold physics/perception approach is used to analyse video game acoustics. The human perception of sound has various strengths and weaknesses, which help to identify the aspects of sound physics that are important to provide a player as well as aspects that may be ignored for efficiency reasons. The thesis begins by revealing the many considerations and implications of incorporating acoustics into a video game, followed by an exploration of the perceptual functions of acoustics in virtual spaces. Several conceptual frameworks are then offered to address some of the problems discovered in the previous sections. By the end of the thesis it will be shown that the main purpose of video game acoustics is to provide a player with a natural experience of sound. People working in the video game industry may use the research presented here to cultivate an understanding of how humans can interact with video games through sound physics, and why it is important to improve the quality of this interaction.Thesis (Ph.D.) -- University of Adelaide, Elder Conservatorium of Music, 202

    Perceptual evaluation of personal, location-aware spatial audio

    Full text link
    This thesis entails an analysis, synthesis and evaluation of the medium of personal, location aware spatial audio (PLASA). The PLASA medium is a specialisation of locative audio—the presentation of audio in relation to the listener’s position. It also intersects with audio augmented reality—the presentation of a virtual audio reality, superimposed on the real world. A PLASA system delivers binaural (personal) spa- tial audio to mobile listeners, with body-position and head-orientation interactivity, so that simulated sound source positions seem fixed in the world reference frame. PLASA technical requirements were analysed and three system architectures identified, employing mobile, remote or distributed rendering. Knowledge of human spatial hearing was reviewed to ascertain likely perceptual effects of the unique factors of PLASA compared to static spatial audio. Human factors identified were multimodal perception of body-motion interaction and coincident visual stimuli. Technical limitations identified were rendering method, individual binaural rendering, and accuracy and latency of position- and orientation-tracking. An experimental PLASA system was built and evaluated technically, then four perceptual experiments were conducted to investigate task-related perceptual per- formance. These experiments tested the identified human factors and technical limitations against performance measures related to localisation and navigation tasks, under conditions designed to be ecologically valid to PLASA application scenarios. A final experiment assessed navigation task performance with real sound sources and un-mediated spatial hearing for comparison with virtual source performance. Results found that body-motion interaction facilitated correction of front–back confusions. Body-motion and the multi-modal stimuli of virtual–audible and real–visible objects supported lower azimuth errors than stationary, mono-modal localisation of the same audio-only stimuli. PLASA users navigated efficiently to stationary virtual sources, despite varied rendering quality and head-turn latencies between 176 ms and 976 ms. Factors of rendering method, individualisation and head-turn latency showed interaction effects such as greater sensitivity to latency for some rendering methods than others. In general, PLASA task performance levels agreed with expectations from static or technical performance tests, and some results demonstrated similar performance levels to those achieved in the real-source baseline test

    Localization of self-generated synthetic footstep sounds on different walked-upon materials through headphones

    No full text
    This paper focuses on the localization of footstep sounds interactively generated during walking and provided through headphones. Three distinct experiments were conducted in a laboratory involving a pair of sandals enhanced with pressure sensors and a footstep synthesizer capable of simulating two typologies of surface materials: solid (e.g., wood) and aggregate (e.g., gravel). Different sound delivery methods (mono, stereo, binaural) as well as several surface materials, in the presence or absence of concurrent contextual auditory information provided as soundscapes, were evaluated in a vertical localization task. Results showed that solid surfaces were localized significantly farther from the walker\u2019s feet than the aggregate ones. This effect was independent of the used rendering technique, of the presence of soundscapes, and of merely temporal or spectral attributes of sound. The effect is hypothesized to be due to a semantic conflict between auditory and haptic information such that the higher the semantic incongruence the greater the distance of the perceived sound source from the feet. The presented results contribute to the development of further knowledge toward a basis for the design of continuous multimodal feedback in virtual reality applications

    Private sound environments in public space: Use of headphones in public parks and public transit

    Get PDF
    The use of headphones is now so commonplace that it is almost second nature for many people to use them. Not only do these people use headphones all the time, but they use them nearly everywhere, including in urban public spaces. In using headphones, people create their own “private sound environments” in public space. This phenomenon merits attention from researchers since the creation of private sound environments may well alter people’s experiences of public space. This study answers five research questions about the use of headphones in parks and on transit: why people use them, when they begin using headphones and when they discontinue using them, what activities they engage in while using headphones, what they listen to, and how using headphones affects their experience. The study was conducted in three New York City parks - Washington, Tompkins, and Madison Square Parks - and on the PATH train that runs between New Jersey and New York City. Four data collection methods were used: focus groups, in-depth interviews, and online and on-site surveys. Findings indicate that the reasons why people use headphones vary depending on how they use them – whether they play audio or wear them without playing audio. People play audio to reminisce and for therapeutic purposes. People wear headphones without audio for insulation in cold weather and to keep their hands free. A majority of respondents begin using headphones when they depart from their homes and discontinue using headphones when they reach their homes or places of employment. While using headphones, people engage in various activities including relaxing, exercising, and observing surroundings. These activities vary depending on whether people are playing audio or not. For the most part, in parks and on transit, the type of audio people play on headphones is music. Respondents reported that the quality of their experiences in parks declines when they listen to audio and improves when they do not. In contrast, the quality of experience on transit improves when they listen to audio and declines when they do not

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore