Search CORE

31 research outputs found

Surround by Sound: A Review of Spatial Audio Recording and Reproduction

Author: Abhayapala Thushara
Chen Hanchi
Samarasinghe Prasanga
Zhang Wen
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

The Australian National University

An investigation into the real-time manipulation and control of three-dimensional sound fields

Author: Wiggins Bruce
Publication venue: University of Derby
Publication date: 01/01/2004
Field of study

This thesis describes a system that can be used for the decoding of a three dimensional audio recording over headphones or two, or more, speakers. A literature review of psychoacoustics and a review (both historical and current) of surround sound systems is carried out. The need for a system which is platform independent is discussed, and the proposal for a system based on an amalgamation of Ambisonics, binaural and transaural reproduction schemes is given. In order for this system to function optimally, each of the three systems rely on providing the listener with the relevant psychoacoustic cues. The conversion from a five speaker ITU array to binaural decode is well documented but pair-wise panning algorithms will not produce the correct lateralisation parameters at the ears of a centrally seated listener. Although Ambisonics has been well researched, no one has, as yet, produced a psychoacoustically optimised decoder for the standard irregular five speaker array as specified by the ITU as the original theory, as proposed by Gerzon and Barton (1992) was produced (known as a Vienna decoder), and example solutions given, before the standard had been decided on. In this work, the original work by Gerzon and Barton (1992) is analysed, and shown to be suboptimal, showing a high/low frequency decoder mismatch due to the method of solving the set of non-linear simultaneous equations. A method, based on the Tabu search algorithm, is applied to the Vienna decoder problem and is shown to provide superior results to those shown by Gerzon and Barton (1992) and is capable of producing multiple solutions to the Vienna decoder problem. During the write up of this report Craven (2003) has shown how 4th order circular harmonics (as used in Ambisonics) can be used to create a frequency independent panning law for the five speaker ITU array, and this report also shows how the Tabu search algorithm can be used to optimise these decoders further. A new method is then demonstrated using the Tabu search algorithm coupled with lateralisation parameters extracted from a binaural simulation of the Ambisonic system to be optimised (as these are the parameters that the Vienna system is approximating). This method can then be altered to take into account head rotations directly which have been shown as an important psychoacoustic parameter in the localisation of a sound source (Spikofski et al., 2001) and is also shown to be useful in differentiating between decoders optimised using the Tabu search form of the Vienna optimisations as no objective measure had been suggested. Optimisations for both Binaural and Transaural reproductions are then discussed so as to maximise the performance of generic HRTF data (i.e. not individualised) using inverse filtering methods, and a technique is shown that minimises the amount of frequency dependant regularisation needed when calculating cross-talk cancellation filters.EPRS

OpenGrey Repository

UDORA - University of Derby Online Research Archive

Binaural to multichannel audio upmix

Author: Jakka Julia
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2005
Field of study

Audion tallennus- ja toistolaitteiden valikoiman kasvaessa on tärkeää, että kaikenlaisilla välineillä tallennettua sekä syntetisoitua audiota voidaan muokata toistettavaksi kaikenlaisilla äänentoistojärjestelmillä. Tässä diplomityössä esitellään menetelmä, jolla binauraalinen audiosignaali voidaan muokata toistettavaksi monikanavaisella kaiutinjärjestelmällä säilyttäen signaalin suuntainformaation. Tällaiselle muokkausmenetelmälle on tarvetta esimerkiksi etäläsnäolosovelluksissa keinona toistaa binauraalinen äänitys monikanavaisella kaiutinjärjestelmällä. Menetelmässä binauraalisesta signaalista estimoidaan ensin äänilähteiden suunnat käyttäen hyväksi korvien välistä aikaeroa. Signaali muokataan monofoniseksi, ja tulosuunnan estimoinnin antama tieto tallennetaan sivuinformaationa. Monofoninen signaali muokataan sen jälkeen halutulle monikanavaiselle kaiutinjärjestelmälle panoroimalla se tallennetun suuntainformaation mukaisesti. Käytännössä menetelmä siis muuntaa korvien välisen aikaeron kanavien väliseksi voimakkuuseroksi. Menetelmässä käytetään ja yhdistellään olemassaolevia tekniikoita tulosuunnan estimoinnille sekä panoroinnille. Menetelmää testattiin vapaamuotoisessa kuuntelukokeessa, sekä lisäämällä ääninäytteisiin binauraalista taustamelua ennen muokkausta ja arvioimalla sen vaikutusta muokatun signaalin laatuun. Menetelmän todettiin toimivan kelvollisesti sekä suuntainformaation säilymisen, että äänen laadun suhteen, ottaen huomioon, että sen kehitystyö on vasta aluillaan.The increasing diversity of popular audio recording and playback systems gives reasons to ensure that recordings made with any equipment, as well as any synthesised audio, can be reproduced for playback with all types of devices. In this thesis, a method is introduced for upmixing binaural audio into a multichannel format while preserving the correct spatial sensation. This type of upmix is required when a binaural recording is desired to be spatially reproduced for playback over a multichannel loudspeaker setup, a scenario typical for e.g. the prospective telepresence appliances. In the upmix method the sound source directions are estimated from the binaural signal by using the interaural time difference. The signal is then downmixed into a monophonic format and the data given by the azimuth estimation is stored as side-information. The monophonic signal is upmixed for an arbitrary multichannel loudspeaker setup by panning it on the basis of the spatial side-information. The method, thus effectively converting interaural time differences into interchannel level differences, employs and conjoins existing techniques for azimuth estimation and discrete panning. The method was tested in an informal listening test, as well as by adding spatial background noise into the samples before upmixing and evaluating its influence on the sound quality of the upmixed samples. The method was found to perform acceptably well in maintaining both the spatiality as well as the sound quality, regarding that much development work remains to be done

Aaltodoc Publication Archive

Enabling technologies for audio augmented reality systems

Author: Gamper Hannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2014
Field of study

Audio augmented reality (AAR) refers to technology that embeds computer-generated auditory content into a user's real acoustic environment. An AAR system has specific requirements that set it apart from regular human--computer interfaces: an audio playback system to allow the simultaneous perception of real and virtual sounds; motion tracking to enable interactivity and location-awareness; the design and implementation of auditory display to deliver AAR content; and spatial rendering to display spatialised AAR content. This thesis presents a series of studies on enabling technologies to meet these requirements. A binaural headset with integrated microphones is assumed as the audio playback system, as it allows mobility and precise control over the ear input signals. Here, user position and orientation tracking methods are proposed that rely on speech signals recorded at the binaural headset microphones. To evaluate the proposed methods, the head orientations and positions of three conferees engaged in a discussion were tracked. The binaural microphones improved tracking performance substantially. The proposed methods are applicable to acoustic tracking with other forms of user-worn microphones. Results from a listening test investigating the effect of auditory display parameters on user performance are reported. The parameters studied were derived from the design choices to be made when implementing auditory display. The results indicate that users are able to detect a sound sample among distractors and estimate sample numerosity accurately with both speech and non-speech audio, if the samples are presented with adequate temporal separation. Whether or not samples were separated spatially had no effect on user performance. However, with spatially separated samples, users were able to detect a sample among distractors and simultaneously localise it. The results of this study are applicable to a variety of AAR applications that require conveying sample presence or numerosity. Spatial rendering is commonly implemented by convolving virtual sounds with head-related transfer functions (HRTFs). Here, a framework is proposed that interpolates HRTFs measured at arbitrary directions and distances. The framework employs Delaunay triangulation to group HRTFs into subsets suitable for interpolation and barycentric coordinates as interpolation weights. The proposed interpolation framework allows the realtime rendering of virtual sources in the near-field via HRTFs measured at various distances

Aaltodoc Publication Archive

Optimization and improvements in spatial sound reproduction systems through perceptual considerations

Author: Gutiérrez Parera Pablo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 07/05/2020
Field of study

[ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI

Crossref

RiuNet

Tools for urban sound quality assessment

Author: Thomas Pieter
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

High Frequency Reproduction in Binaural Ambisonic Rendering

Author: McKenzie Thomas
Publication venue: University of York
Publication date: 01/12/2019
Field of study

Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis. This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements

White Rose E-theses Online

Proceedings of the EAA Joint Symposium on Auralization and Ambisonics 2014

Author
Publication venue: Berlin
Publication date: 07/08/2014
Field of study

In consideration of the remarkable intensity of research in the field of Virtual Acoustics, including different areas such as sound field analysis and synthesis, spatial audio technologies, and room acoustical modeling and auralization, it seemed about time to organize a second international symposium following the model of the first EAA Auralization Symposium initiated in 2009 by the acoustics group of the former Helsinki University of Technology (now Aalto University). Additionally, research communities which are focused on different approaches to sound field synthesis such as Ambisonics or Wave Field Synthesis have, in the meantime, moved closer together by using increasingly consistent theoretical frameworks. Finally, the quality of virtual acoustic environments is often considered as a result of all processing stages mentioned above, increasing the need for discussions on consistent strategies for evaluation. Thus, it seemed appropriate to integrate two of the most relevant communities, i.e. to combine the 2nd International Auralization Symposium with the 5th International Symposium on Ambisonics and Spherical Acoustics. The Symposia on Ambisonics, initiated in 2009 by the Institute of Electronic Music and Acoustics of the University of Music and Performing Arts in Graz, were traditionally dedicated to problems of spherical sound field analysis and re-synthesis, strategies for the exchange of ambisonics-encoded audio material, and – more than other conferences in this area – the artistic application of spatial audio systems. This publication contains the official conference proceedings. It includes 29 manuscripts which have passed a 3-stage peer-review with a board of about 70 international reviewers involved in the process. Each contribution has already been published individually with a unique DOI on the DepositOnce digital repository of TU Berlin. Some conference contributions have been recommended for resubmission to Acta Acustica united with Acustica, to possibly appear in a Special Issue on Virtual Acoustics in late 2014. These are not published in this collection.European Acoustics Associatio

DepositOnce

Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

Author: Rudzki Tomasz
Publication venue
Publication date: 01/05/2023
Field of study

With the increasing popularity of spatial audio content streaming and interactive binaural audio rendering, it is pertinent to study the quality of the critical components of such systems. This includes low-bitrate compression of Ambisonic scenes and binaural rendering schemes. This thesis presents a group of perceptual experiments focusing on these two elements of the Ambisonic delivery chain. The first group of experiments focused on the quality of low-bitrate compression of Ambisonics. The first study evaluated the perceived timbral quality degradation introduced by the Opus audio codec at different bitrate settings and Ambisonic orders. This experiment was conducted using multi-loudspeaker reproduction as well as binaural rendering. The second study has been dedicated to auditory localisation performance in bitrate-compressed Ambisonic scenes reproduced over loudspeakers and binaurally using generic and individually measured HRTF sets. Finally, the third study extended the evaluated set of codec parameters by testing different channel mappings and various audio stimuli contexts. This study was conducted in VR thanks to a purposely developed listening test framework. The comprehensive evaluation of the Opus codec led to a set of recommendations regarding optimal codec parameters. The second group of experiments focused on the evaluation of different methods for binaural rendering of Ambisonics. The first study in this group focused on the implementation of the established methods for designing Ambisonic-to-binaural filters and subsequent objective and subjective evaluations of these. The second study explored the concept of hybrid binaural rendering combining anechoic filters with reverberant ones. Finally, addressing the problem of non-individual HRTFs used for spatial audio rendering, an XR-based method for acquiring individual HRTFs using a single loudspeaker has been proposed. The conducted perceptual evaluations identified key areas where the Ambisonic delivery chain could be improved to provide a more satisfactory user experience

White Rose E-theses Online

Mixed Structural Models for 3D Audio in Virtual Environments

Author: Geronazzo Michele
Publication venue
Publication date: 31/01/2014
Field of study

In the world of ICT, strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of the new technology by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but a few. The concurrent presence of multimodal senses and activities make multimodal virtual environments potentially flexible and adaptive, allowing users to switch between modalities as needed during the continuously changing conditions of use situation. Augmentation through additional modalities and sensory substitution techniques are compelling ingredients for presenting information non-visually, when the visual bandwidth is overloaded, when data are visually occluded, or when the visual channel is not available to the user (e.g., for visually impaired people). Multimodal systems for the representation of spatial information will largely benefit from the implementation of audio engines that have extensive knowledge of spatial hearing and virtual acoustics. Models for spatial audio can provide accurate dynamic information about the relation between the sound source and the surrounding environment, including the listener and his/her body which acts as an additional filter. Indeed, this information cannot be substituted by any other modality (i.e., visual or tactile). Nevertheless, today's spatial representation of audio within sonification tends to be simplistic and with poor interaction capabilities, being multimedia systems currently focused on graphics processing mostly, and integrated with simple stereo or multi-channel surround-sound. On a much different level lie binaural rendering approaches based on headphone reproduction, taking into account that possible disadvantages (e.g. invasiveness, non-flat frequency responses) are counterbalanced by a number of desirable features. Indeed, these systems might control and/or eliminate reverberation and other acoustic effects of the real listening space, reduce background noise, and provide adaptable and portable audio displays, which are all relevant aspects especially in enhanced contexts. Most of the binaural sound rendering techniques currently exploited in research rely on the use of Head-Related Transfer Functions (HRTFs), i.e. peculiar filters that capture the acoustic effects of the human head and ears. HRTFs allow loyal simulation of the audio signal that arrives at the entrance of the ear canal as a function of the sound source's spatial position. HRTF filters are usually presented under the form of acoustic signals acquired on dummy heads built according to mean anthropometric measurements. Nevertheless, anthropometric features of the human body have a key role in HRTF shaping: several studies have attested how listening to non-individual binaural sounds results in evident localization errors. On the other hand, individual HRTF measurements on a significant number of subjects result both time- and resource-expensive. Several techniques for synthetic HRTF design have been proposed during the last two decades and the most promising one relies on structural HRTF models. In this revolutionary approach, the most important effects involved in spatial sound perception (acoustic delays and shadowing due to head diffraction, reflections on pinna contours and shoulders, resonances inside the ear cavities) are isolated and modeled separately with a corresponding filtering element. HRTF selection and modeling procedures can be determined by physical interpretation: parameters of each rendering blocks or selection criteria can be estimated from real and simulated data and related to anthropometric geometries. Effective personal auditory displays represent an innovative breakthrough for a plethora of applications and structural approach can also allow for effective scalability depending on the available computational resources or bandwidth. Scenes with multiple highly realistic audiovisual objects are easily managed exploiting parallelism of increasingly ubiquitous GPUs (Graphics Processing Units). Building individual headphone equalization with perceptually robust inverse filtering techniques represents a fundamental step towards the creation of personal virtual auditory displays (VADs). To this regard, several examples might benefit from these considerations: multi-channel downmix over headphones, personal cinema, spatial audio rendering in mobile devices, computer-game engines and individual binaural audio standards for movie and music production. This thesis presents a family of approaches that overcome the current limitations of headphone-based 3D audio systems, aiming at building personal auditory displays through structural binaural audio models for an immersive sound reproduction. The resulting models allow for an interesting form of content adaptation and personalization, since they include parameters related to the user's anthropometry in addition to those related to the sound sources and the environment. The covered research directions converge to a novel framework for synthetic HRTF design and customization that combines the structural modeling paradigm with other HRTF selection techniques (inspired by non-individualized HRTF selection procedures) and represents the main novel contribution of this thesis: the Mixed Structural Modeling (MSM) approach considers the global HRTF as a combination of structural components, which can be chosen to be either synthetic or recorded components. In both cases, customization is based on individual anthropometric data, which are used to either fit the model parameters or to select a measured/simulated component within a set of available responses. The definition and experimental validation of the MSM approach addresses several pivotal issues towards the acquisition and delivery of binaural sound scenes and designing guidelines for personalized 3D audio virtual environments holding the potential of novel forms of customized communication and interaction with sound and music content. The thesis also presents a multimodal interactive system which is used to conduct subjective test on multi-sensory integration in virtual environments. Four experimental scenarios are proposed in order to test the capabilities of auditory feedback jointly to tactile or visual modalities. 3D audio feedback related to user’s movements during simple target following tasks is tested as an applicative example of audio-visual rehabilitation system. Perception of direction of footstep sounds interactively generated during walking and provided through headphones highlights how spatial information can clarify the semantic congruence between movement and multimodal feedback. A real time, physically informed audio-tactile interactive system encodes spatial information in the context of virtual map presentation with particular attention to orientation and mobility (O&M) learning processes addressed to visually impaired people. Finally, an experiment analyzes the haptic estimation of size of a virtual 3D object (a stair-step) whereas the exploration is accompanied by a real-time generated auditory feedback whose parameters vary as a function of the height of the interaction point. The collected data from these experiments suggest that well-designed multimodal feedback, exploiting 3D audio models, can definitely be used to improve performance in virtual reality and learning processes in orientation and complex motor tasks, thanks to the high level of attention, engagement, and presence provided to the user. The research framework, based on the MSM approach, serves as an important evaluation tool with the aim of progressively determining the relevant spatial attributes of sound for each application domain. In this perspective, such studies represent a novelty in the current literature on virtual and augmented reality, especially concerning the use of sonification techniques in several aspects of spatial cognition and internal multisensory representation of the body. This thesis is organized as follows. An overview of spatial hearing and binaural technology through headphones is given in Chapter 1. Chapter 2 is devoted to the Mixed Structural Modeling formalism and philosophy. In Chapter 3, topics in structural modeling for each body component are studied, previous research and two new models, i.e. near-field distance dependency and external-ear spectral cue, are presented. Chapter 4 deals with a complete case study of the mixed structural modeling approach and provides insights about the main innovative aspects of such modus operandi. Chapter 5 gives an overview of number of a number of proposed tools for the analysis and synthesis of HRTFs. System architectural guidelines and constraints are discussed in terms of real-time issues, mobility requirements and customized audio delivery. In Chapter 6, two case studies investigate the behavioral importance of spatial attribute of sound and how continuous interaction with virtual environments can benefit from using spatial audio algorithms. Chapter 7 describes a set of experiments aimed at assessing the contribution of binaural audio through headphones in learning processes of spatial cognitive maps and exploration of virtual objects. Finally, conclusions are drawn and new research horizons for further work are exposed in Chapter 8

Archivio istituzionale della ricerca - Università di Padova