404 research outputs found
Virtual Audio - Three-Dimensional Audio in Virtual Environments
Three-dimensional interactive audio has a variety ofpotential uses in human-machine interfaces. After lagging seriously
behind the visual components, the importance of sound is now becoming
increas-ingly accepted.
This paper mainly discusses background and techniques to implement
three-dimensional audio in computer interfaces. A case study of a
system for three-dimensional audio, implemented by the author, is
described in great detail. The audio system was moreover integrated
with a virtual reality system and conclusions on user tests and use
of the audio system is presented along with proposals for future work
at the end of the paper.
The thesis begins with a definition of three-dimensional audio and a
survey on the human auditory system to give the reader the needed
knowledge of what three-dimensional audio is and how human auditory
perception works
Artificial Simulation of Audio Spatialisation: Developing a Binaural System
Sound localisation deals with how and why we can locate sound sources
in our spatial environment. Sound spatialisation defines how sound is
distributed in this environment. Several acoustic and psychoacoustic
phenomena are involved in sound localisation and spatialisation. The
importance of these phenomena becomes apparent when endeavouring to
recreate and emulate auditory spatial events using computers
Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds
In this paper we address the problems of modeling the acoustic space
generated by a full-spectrum sound source and of using the learned model for
the localization and separation of multiple sources that simultaneously emit
sparse-spectrum sounds. We lay theoretical and methodological grounds in order
to introduce the binaural manifold paradigm. We perform an in-depth study of
the latent low-dimensional structure of the high-dimensional interaural
spectral data, based on a corpus recorded with a human-like audiomotor robot
head. A non-linear dimensionality reduction technique is used to show that
these data lie on a two-dimensional (2D) smooth manifold parameterized by the
motor states of the listener, or equivalently, the sound source directions. We
propose a probabilistic piecewise affine mapping model (PPAM) specifically
designed to deal with high-dimensional data exhibiting an intrinsic piecewise
linear structure. We derive a closed-form expectation-maximization (EM)
procedure for estimating the model parameters, followed by Bayes inversion for
obtaining the full posterior density function of a sound source direction. We
extend this solution to deal with missing data and redundancy in real world
spectrograms, and hence for 2D localization of natural sound sources such as
speech. We further generalize the model to the challenging case of multiple
sound sources and we propose a variational EM framework. The associated
algorithm, referred to as variational EM for source separation and localization
(VESSL) yields a Bayesian estimation of the 2D locations and time-frequency
masks of all the sources. Comparisons of the proposed approach with several
existing methods reveal that the combination of acoustic-space learning with
Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table
Movements in Binaural Space: Issues in HRTF Interpolation and Reverberation, with applications to Computer Music
This thesis deals broadly with the topic of Binaural Audio. After reviewing the
literature, a reappraisal of the minimum-phase plus linear delay model for HRTF
representation and interpolation is offered. A rigorous analysis of threshold based
phase unwrapping is also performed. The results and conclusions drawn from these
analyses motivate the development of two novel methods for HRTF representation
and interpolation. Empirical data is used directly in a Phase Truncation method. A
Functional Model for phase is used in the second method based on the
psychoacoustical nature of Interaural Time Differences. Both methods are validated;
most significantly, both perform better than a minimum-phase method in subjective
testing.
The accurate, artefact-free dynamic source processing afforded by the above
methods is harnessed in a binaural reverberation model, based on an early reflection
image model and Feedback Delay Network diffuse field, with accurate interaural
coherence. In turn, these flexible environmental processing algorithms are used in
the development of a multi-channel binaural application, which allows the audition
of multi-channel setups in headphones. Both source and listener are dynamic in this
paradigm. A GUI is offered for intuitive use of the application.
HRTF processing is thus re-evaluated and updated after a review of accepted
practice. Novel solutions are presented and validated. Binaural reverberation is
recognised as a crucial tool for convincing artificial spatialisation, and is developed
on similar principles. Emphasis is placed on transparency of development practices,
with the aim of wider dissemination and uptake of binaural technology
Measurement and modelling of head-related transfer function for spatial audio synthesis
There has been a growing interest in spatial sound generation arising from the development of new communications and media technologies. Binaural spatial sound systems are capable of encoding and rendering sound sources accurately in three dimensional space using only two recording/playback channels. This is based on the concept of the Head-Related Transfer Function (HRTF), which is a set of acoustic filters from the sound source to a listener's eardrums and contains all the listening cues used by the hearing mechanism for decoding spatial information encoded in binaural signals. The HRTF is usually obtained from acoustic measurements on different persons. In the case of discrete data and sets of measurements corresponding to different human subjects, it is desirable to have a continuous functional representation of the HRTF for efficiently rendering moving sounds in the virtual spatial audio systems; further this representation should be well-suited for customization to an individual listener. In this thesis, modal analysis is applied to examine the HRTF data structure, that is to employ the wave equation solutions to expand the HRTF with separable basis functions. This leads to a general representation of the HRTF into separated spatial and spectral components, where the spatial basis functions modes account for the HRTF spatial variations and the remaining HRTF spectral components provide a new means to examine the human body scattering behavior. The general model is further developed into the HRTF continuous functional representations. We use the normalized spatial modes to link near-field and far-field HRTFs directly, which provides a way to obtain the HRTFs at different ranges from measurements conducted at only a single range. The spatially invariant HRTF spectral components are represented continuously using an orthogonal series. Both spatial and spectral basis functions are well known functions, thus the developed analytical model can be used to easily examine the HRTF data feature-individualization. An important finding of this thesis is that the HRTF decomposition with the spatial basis functions can be well approximated by a finite number, which is defined as the HRTF spatial dimensionality. The dimensionality determines the least number of the HRTF measurements in space. We perform high resolution HRTF measurements on a KEMAR mannequin in a semi-anechoic acoustic chamber. Both signal processing aspects to extract HRTFs from the raw measurements and a practical high resolution spatial sampling scheme have been given in this thesis
Spatial Audio and Individualized HRTFs using a Convolutional Neural Network (CNN)
Spatial audio and 3-Dimensional sound rendering techniques play a pivotal and
essential role in immersive audio experiences. Head-Related Transfer Functions
(HRTFs) are acoustic filters which represent how sound interacts with an
individual's unique head and ears anatomy. The use of HRTFs compliant to the
subjects anatomical traits is crucial to ensure a personalized and unique
spatial experience. This work proposes the implementation of an HRTF
individualization method based on anthropometric features automatically
extracted from ear images using a Convolutional Neural Network (CNN). Firstly,
a CNN is implemented and tested to assess the performance of machine learning
on positioning landmarks on ear images. The I-BUG dataset, containing ear
images with corresponding 55 landmarks, was used to train and test the neural
network. Subsequently, 12 relevant landmarks were selected to correspond to 7
specific anthropometric measurements established by the HUTUBS database. These
landmarks serve as a reference for distance computation in pixels in order to
retrieve the anthropometric measurements from the ear images. Once the 7
distances in pixels are extracted from the ear image, they are converted in
centimetres using conversion factors, a best match method vector is implemented
computing the Euclidean distance for each set in a database of 116 ears with
their corresponding 7 anthropometric measurements provided by the HUTUBS
database. The closest match of anthropometry can be identified and the
corresponding set of HRTFs can be obtained for personnalized use. The method is
evaluated in its validity instead of the accuracy of the results. The
conceptual scope of each stage has been verified and substantiated to function
correctly. The various steps and the available elements in the process are
reviewed and challenged to define a greater algorithm entity designed for the
desired task
Aprendizado de variedades para a síntese de áudio espacial
Orientadores: Luiz César Martini, Bruno Sanches MasieroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O objetivo do áudio espacial gerado com a técnica binaural é simular uma fonte sonora em localizações espaciais arbitrarias através das Funções de Transferência Relativas à Cabeça (HRTFs) ou também chamadas de Funções de Transferência Anatômicas. As HRTFs modelam a interação entre uma fonte sonora e a antropometria de uma pessoa (e.g., cabeça, torso e orelhas). Se filtrarmos uma fonte de áudio através de um par de HRTFs (uma para cada orelha), o som virtual resultante parece originar-se de uma localização espacial específica. Inspirados em nossos resultados bem sucedidos construindo uma aplicação prática de reconhecimento facial voltada para pessoas com deficiência visual que usa uma interface de usuário baseada em áudio espacial, neste trabalho aprofundamos nossa pesquisa para abordar vários aspectos científicos do áudio espacial. Neste contexto, esta tese analisa como incorporar conhecimentos prévios do áudio espacial usando uma nova representação não-linear das HRTFs baseada no aprendizado de variedades para enfrentar vários desafios de amplo interesse na comunidade do áudio espacial, como a personalização de HRTFs, a interpolação de HRTFs e a melhoria da localização de fontes sonoras. O uso do aprendizado de variedades para áudio espacial baseia-se no pressuposto de que os dados (i.e., as HRTFs) situam-se em uma variedade de baixa dimensão. Esta suposição também tem sido de grande interesse entre pesquisadores em neurociência computacional, que argumentam que as variedades são cruciais para entender as relações não lineares subjacentes à percepção no cérebro. Para todas as nossas contribuições usando o aprendizado de variedades, a construção de uma única variedade entre os sujeitos através de um grafo Inter-sujeito (Inter-subject graph, ISG) revelou-se como uma poderosa representação das HRTFs capaz de incorporar conhecimento prévio destas e capturar seus fatores subjacentes. Além disso, a vantagem de construir uma única variedade usando o nosso ISG e o uso de informações de outros indivíduos para melhorar o desempenho geral das técnicas aqui propostas. Os resultados mostram que nossas técnicas baseadas no ISG superam outros métodos lineares e não-lineares nos desafios de áudio espacial abordados por esta teseAbstract: The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/14630-9FAPESPCAPE
- …