1,363 research outputs found

    Frequency Estimation Of The First Pinna Notch In Head-Related Transfer Functions With A Linear Anthropometric Model

    Get PDF
    The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1, N2, and N3) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1

    Aprendizado de variedades para a síntese de áudio espacial

    Get PDF
    Orientadores: Luiz César Martini, Bruno Sanches MasieroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O objetivo do áudio espacial gerado com a técnica binaural é simular uma fonte sonora em localizações espaciais arbitrarias através das Funções de Transferência Relativas à Cabeça (HRTFs) ou também chamadas de Funções de Transferência Anatômicas. As HRTFs modelam a interação entre uma fonte sonora e a antropometria de uma pessoa (e.g., cabeça, torso e orelhas). Se filtrarmos uma fonte de áudio através de um par de HRTFs (uma para cada orelha), o som virtual resultante parece originar-se de uma localização espacial específica. Inspirados em nossos resultados bem sucedidos construindo uma aplicação prática de reconhecimento facial voltada para pessoas com deficiência visual que usa uma interface de usuário baseada em áudio espacial, neste trabalho aprofundamos nossa pesquisa para abordar vários aspectos científicos do áudio espacial. Neste contexto, esta tese analisa como incorporar conhecimentos prévios do áudio espacial usando uma nova representação não-linear das HRTFs baseada no aprendizado de variedades para enfrentar vários desafios de amplo interesse na comunidade do áudio espacial, como a personalização de HRTFs, a interpolação de HRTFs e a melhoria da localização de fontes sonoras. O uso do aprendizado de variedades para áudio espacial baseia-se no pressuposto de que os dados (i.e., as HRTFs) situam-se em uma variedade de baixa dimensão. Esta suposição também tem sido de grande interesse entre pesquisadores em neurociência computacional, que argumentam que as variedades são cruciais para entender as relações não lineares subjacentes à percepção no cérebro. Para todas as nossas contribuições usando o aprendizado de variedades, a construção de uma única variedade entre os sujeitos através de um grafo Inter-sujeito (Inter-subject graph, ISG) revelou-se como uma poderosa representação das HRTFs capaz de incorporar conhecimento prévio destas e capturar seus fatores subjacentes. Além disso, a vantagem de construir uma única variedade usando o nosso ISG e o uso de informações de outros indivíduos para melhorar o desempenho geral das técnicas aqui propostas. Os resultados mostram que nossas técnicas baseadas no ISG superam outros métodos lineares e não-lineares nos desafios de áudio espacial abordados por esta teseAbstract: The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/14630-9FAPESPCAPE

    An Immersive Multi-Party Conferencing System for Mobile Devices Using 3D Binaural Audio

    Full text link
    [EN] The use of mobile telephony, along with the widespread of smartphones in the consumer market, is gradually displacing traditional telephony. Fixed-line telephone conference calls have been widely employed for carrying out distributed meetings around the world in the last decades. However, the powerful characteristics brought by modern mobile devices and data networks allow for new conferencing schemes based on immersive communication, one the fields having major commercial and technical interest within the telecommunications industry today. In this context, adding spatial audio features into conventional conferencing systems is a natural way of creating a realistic communication environment. In fact, the human auditory system takes advantage of spatial audio cues to locate, separate and understand multiple speakers when they talk simultaneously. As a result, speech intelligibility is significantly improved if the speakers are simulated to be spatially distributed. This paper describes the development of a new immersive multi-party conference call service for mobile devices (smartphones and tablets) that substantially improves the identification and intelligibility of the participants. Headphone-based audio reproduction and binaural sound processing algorithms allow the user to locate the different speakers within a virtual meeting room. Moreover, the use of a large touch screen helps the user to identify and remember the participants taking part in the conference, with the possibility of changing their spatial location in an interactive way.This work has been partially supported by the government of Spain grant TEC-2009-14414-C03-01 and by the new technologies department of TelefónicaAguilera Martí, E.; López Monfort, JJ.; Cobos Serrano, M.; Macià Pina, L.; Martí Guerola, A. (2012). An Immersive Multi-Party Conferencing System for Mobile Devices Using 3D Binaural Audio. Waves. 4:5-14. http://hdl.handle.net/10251/57918S514

    Mixed Structural Models for 3D Audio in Virtual Environments

    Get PDF
    In the world of ICT, strategies for innovation and development are increasingly focusing on applications that require spatial representation and real-time interaction with and within 3D media environments. One of the major challenges that such applications have to address is user-centricity, reflecting e.g. on developing complexity-hiding services so that people can personalize their own delivery of services. In these terms, multimodal interfaces represent a key factor for enabling an inclusive use of the new technology by everyone. In order to achieve this, multimodal realistic models that describe our environment are needed, and in particular models that accurately describe the acoustics of the environment and communication through the auditory modality. Examples of currently active research directions and application areas include 3DTV and future internet, 3D visual-sound scene coding, transmission and reconstruction and teleconferencing systems, to name but a few. The concurrent presence of multimodal senses and activities make multimodal virtual environments potentially flexible and adaptive, allowing users to switch between modalities as needed during the continuously changing conditions of use situation. Augmentation through additional modalities and sensory substitution techniques are compelling ingredients for presenting information non-visually, when the visual bandwidth is overloaded, when data are visually occluded, or when the visual channel is not available to the user (e.g., for visually impaired people). Multimodal systems for the representation of spatial information will largely benefit from the implementation of audio engines that have extensive knowledge of spatial hearing and virtual acoustics. Models for spatial audio can provide accurate dynamic information about the relation between the sound source and the surrounding environment, including the listener and his/her body which acts as an additional filter. Indeed, this information cannot be substituted by any other modality (i.e., visual or tactile). Nevertheless, today's spatial representation of audio within sonification tends to be simplistic and with poor interaction capabilities, being multimedia systems currently focused on graphics processing mostly, and integrated with simple stereo or multi-channel surround-sound. On a much different level lie binaural rendering approaches based on headphone reproduction, taking into account that possible disadvantages (e.g. invasiveness, non-flat frequency responses) are counterbalanced by a number of desirable features. Indeed, these systems might control and/or eliminate reverberation and other acoustic effects of the real listening space, reduce background noise, and provide adaptable and portable audio displays, which are all relevant aspects especially in enhanced contexts. Most of the binaural sound rendering techniques currently exploited in research rely on the use of Head-Related Transfer Functions (HRTFs), i.e. peculiar filters that capture the acoustic effects of the human head and ears. HRTFs allow loyal simulation of the audio signal that arrives at the entrance of the ear canal as a function of the sound source's spatial position. HRTF filters are usually presented under the form of acoustic signals acquired on dummy heads built according to mean anthropometric measurements. Nevertheless, anthropometric features of the human body have a key role in HRTF shaping: several studies have attested how listening to non-individual binaural sounds results in evident localization errors. On the other hand, individual HRTF measurements on a significant number of subjects result both time- and resource-expensive. Several techniques for synthetic HRTF design have been proposed during the last two decades and the most promising one relies on structural HRTF models. In this revolutionary approach, the most important effects involved in spatial sound perception (acoustic delays and shadowing due to head diffraction, reflections on pinna contours and shoulders, resonances inside the ear cavities) are isolated and modeled separately with a corresponding filtering element. HRTF selection and modeling procedures can be determined by physical interpretation: parameters of each rendering blocks or selection criteria can be estimated from real and simulated data and related to anthropometric geometries. Effective personal auditory displays represent an innovative breakthrough for a plethora of applications and structural approach can also allow for effective scalability depending on the available computational resources or bandwidth. Scenes with multiple highly realistic audiovisual objects are easily managed exploiting parallelism of increasingly ubiquitous GPUs (Graphics Processing Units). Building individual headphone equalization with perceptually robust inverse filtering techniques represents a fundamental step towards the creation of personal virtual auditory displays (VADs). To this regard, several examples might benefit from these considerations: multi-channel downmix over headphones, personal cinema, spatial audio rendering in mobile devices, computer-game engines and individual binaural audio standards for movie and music production. This thesis presents a family of approaches that overcome the current limitations of headphone-based 3D audio systems, aiming at building personal auditory displays through structural binaural audio models for an immersive sound reproduction. The resulting models allow for an interesting form of content adaptation and personalization, since they include parameters related to the user's anthropometry in addition to those related to the sound sources and the environment. The covered research directions converge to a novel framework for synthetic HRTF design and customization that combines the structural modeling paradigm with other HRTF selection techniques (inspired by non-individualized HRTF selection procedures) and represents the main novel contribution of this thesis: the Mixed Structural Modeling (MSM) approach considers the global HRTF as a combination of structural components, which can be chosen to be either synthetic or recorded components. In both cases, customization is based on individual anthropometric data, which are used to either fit the model parameters or to select a measured/simulated component within a set of available responses. The definition and experimental validation of the MSM approach addresses several pivotal issues towards the acquisition and delivery of binaural sound scenes and designing guidelines for personalized 3D audio virtual environments holding the potential of novel forms of customized communication and interaction with sound and music content. The thesis also presents a multimodal interactive system which is used to conduct subjective test on multi-sensory integration in virtual environments. Four experimental scenarios are proposed in order to test the capabilities of auditory feedback jointly to tactile or visual modalities. 3D audio feedback related to user’s movements during simple target following tasks is tested as an applicative example of audio-visual rehabilitation system. Perception of direction of footstep sounds interactively generated during walking and provided through headphones highlights how spatial information can clarify the semantic congruence between movement and multimodal feedback. A real time, physically informed audio-tactile interactive system encodes spatial information in the context of virtual map presentation with particular attention to orientation and mobility (O&M) learning processes addressed to visually impaired people. Finally, an experiment analyzes the haptic estimation of size of a virtual 3D object (a stair-step) whereas the exploration is accompanied by a real-time generated auditory feedback whose parameters vary as a function of the height of the interaction point. The collected data from these experiments suggest that well-designed multimodal feedback, exploiting 3D audio models, can definitely be used to improve performance in virtual reality and learning processes in orientation and complex motor tasks, thanks to the high level of attention, engagement, and presence provided to the user. The research framework, based on the MSM approach, serves as an important evaluation tool with the aim of progressively determining the relevant spatial attributes of sound for each application domain. In this perspective, such studies represent a novelty in the current literature on virtual and augmented reality, especially concerning the use of sonification techniques in several aspects of spatial cognition and internal multisensory representation of the body. This thesis is organized as follows. An overview of spatial hearing and binaural technology through headphones is given in Chapter 1. Chapter 2 is devoted to the Mixed Structural Modeling formalism and philosophy. In Chapter 3, topics in structural modeling for each body component are studied, previous research and two new models, i.e. near-field distance dependency and external-ear spectral cue, are presented. Chapter 4 deals with a complete case study of the mixed structural modeling approach and provides insights about the main innovative aspects of such modus operandi. Chapter 5 gives an overview of number of a number of proposed tools for the analysis and synthesis of HRTFs. System architectural guidelines and constraints are discussed in terms of real-time issues, mobility requirements and customized audio delivery. In Chapter 6, two case studies investigate the behavioral importance of spatial attribute of sound and how continuous interaction with virtual environments can benefit from using spatial audio algorithms. Chapter 7 describes a set of experiments aimed at assessing the contribution of binaural audio through headphones in learning processes of spatial cognitive maps and exploration of virtual objects. Finally, conclusions are drawn and new research horizons for further work are exposed in Chapter 8

    The Impact of an Accurate Vertical Localization with HRTFs on Short Explorations of Immersive Virtual Reality Scenarios

    Get PDF
    Achieving a full 3D auditory experience with head-related transfer functions (HRTFs) is still one of the main challenges of spatial audio rendering. HRTFs capture the listener's acoustic effects and personal perception, allowing immersion in virtual reality (VR) applications. This paper aims to investigate the connection between listener sensitivity in vertical localization cues and experienced presence, spatial audio quality, and attention. Two VR experiments with head-mounted display (HMD) and animated visual avatar are proposed: (i) a screening test aiming to evaluate the participants' localization performance with HRTFs for a non-visible spatialized audio source, and (ii) a 2 minute free exploration of a VR scene with five audiovisual sources in a both non-spatialized (2D stereo panning) and spatialized (free-field HRTF rendering) listening conditions. The screening test allows a distinction between good and bad localizers. The second one shows that no biases are introduced in the quality of the experience (QoE) due to different audio rendering methods; more interestingly, good localizers perceive a lower audio latency and they are less involved in the visual aspects

    Extracting significant features from the HRTF

    Get PDF
    Proceedings of the 9th International Conference on Auditory Display (ICAD), Boston, MA, July 7-9, 2003.The Head Related Transfer Function (HRTF) characterizes the auditory cues created by scattering of sound off a person's anatomy. While it is known that features in the HRTF can be associated with various phenomena, such as head diffraction, head and torso reflection, knee reflection and pinna resonances and anti resonances, identification of these phenomena are usually qualitative and/or heuristic. The objective of this paper is to attempt to decompose the HRTF and extract significant features that are perceptually important for source localization. Some of the significant features that have been identified are the pinna resonances and the notches in the spectrum caused by various parts of the body. We develop signal processing algorithms to decompose the HRTF into components, and extract the features corresponding to each component. The support of NSF award ITR-0086075 is gratefully acknowledge

    Natural sound rendering for headphones: . . .

    Get PDF
    With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial paper presents signal processing techniques to tackle these challenges to assist human listening

    A system for room acoustic simulation for one's own voice

    Get PDF
    The real-time simulation of room acoustical environments for one’s own voice, using generic software, has been difficult until very recently due to the computational load involved: requiring real-time convolution of a person’s voice with a potentially large number of long room impulse responses. This thesis is presenting a room acoustical simulation system with a software-based solution to perform real-time convolutions with headtracking; to simulate the effect of room acoustical environments on the sound of one’s own voice, using binaural technology. In order to gather data to implement headtracking in the system, human head- movements are characterized while reading a text aloud. The rooms that are simulated with the system are actual rooms that are characterized by measuring the room impulse response from the mouth to ears of the same head (oral binaural room impulse response, OBRIR). By repeating this process at 2o increments in the yaw angle on the horizontal plane, the rooms are binaurally scanned around a given position to obtain a collection of OBRIRs, which is then used by the software-based convolution system. In the rooms that are simulated with the system, a person equipped with a near- mouth microphone and near-ear loudspeakers can speak or sing, and hear their voice as it would sound in the measured rooms, while physically being in an anechoic room. By continually updating the person’s head orientation using headtracking, the corresponding OBRIR is chosen for convolution with their voice. The system described in this thesis achieves the low latency that is required to simulate nearby reflections, and it can perform convolution with long room impulse responses. The perceptual validity of the system is studied with two experiments, involving human participants reading aloud a set-text. The system presented in this thesis can be used to design experiments that study the various aspects of the auditory perception of the sound of one’s own voice in room environments. The system can also be adapted to incorporate a module that enables listening to the sound of one’s own voice in commercial applications such as architectural acoustic room simulation software, teleconferencing systems, virtual reality and gaming applications, etc

    A system for room acoustic simulation for one's own voice

    Get PDF
    The real-time simulation of room acoustical environments for one’s own voice, using generic software, has been difficult until very recently due to the computational load involved: requiring real-time convolution of a person’s voice with a potentially large number of long room impulse responses. This thesis is presenting a room acoustical simulation system with a software-based solution to perform real-time convolutions with headtracking; to simulate the effect of room acoustical environments on the sound of one’s own voice, using binaural technology. In order to gather data to implement headtracking in the system, human head- movements are characterized while reading a text aloud. The rooms that are simulated with the system are actual rooms that are characterized by measuring the room impulse response from the mouth to ears of the same head (oral binaural room impulse response, OBRIR). By repeating this process at 2o increments in the yaw angle on the horizontal plane, the rooms are binaurally scanned around a given position to obtain a collection of OBRIRs, which is then used by the software-based convolution system. In the rooms that are simulated with the system, a person equipped with a near- mouth microphone and near-ear loudspeakers can speak or sing, and hear their voice as it would sound in the measured rooms, while physically being in an anechoic room. By continually updating the person’s head orientation using headtracking, the corresponding OBRIR is chosen for convolution with their voice. The system described in this thesis achieves the low latency that is required to simulate nearby reflections, and it can perform convolution with long room impulse responses. The perceptual validity of the system is studied with two experiments, involving human participants reading aloud a set-text. The system presented in this thesis can be used to design experiments that study the various aspects of the auditory perception of the sound of one’s own voice in room environments. The system can also be adapted to incorporate a module that enables listening to the sound of one’s own voice in commercial applications such as architectural acoustic room simulation software, teleconferencing systems, virtual reality and gaming applications, etc
    corecore