254 research outputs found

    The Effectiveness of Chosen Partial Anthropometric Measurements in Individualizing Head-Related Transfer Functions on Median Plane

    Get PDF
    Individualized  head-related  impulse  responses  (HRIRs)  to  perfectly suit  a  particular  listener  remains  an  open  problem  in  the  area  of  HRIRs modeling.   We  have  modeled  the  whole  range  of  magnitude  of  head-related transfer  functions  (HRTFs)  in  frequency  domain  via  principal  components analysis  (PCA),  where  37  persons  were  subjected  to  sound  sources  on  median plane.   We  found  that  a  linear  combination  of  only  10  orthonormal  basis functions was sufficient to satisfactorily model individual magnitude HRTFs. It was our goal to form multiple linear regressions (MLR) between weights of basis functions acquired from PCA and chosen partial anthropometric  measurements in  order  to  individualize  a  particular  listener's  H RTFs  with  his  or  her  own anthropometries. We proposed a novel individualization method based on MLR of  weights  of  basis  functions  by  employing  only  8  out  of  27  anthropometric measurements.  The  experiments'  results  showed  the  proposed  method,  with mean  error  of  11.21%,  outperformed  our  previous  works  on  individualizing minimum  phase  HRIRs  (mean  error  22.50%)  and  magnitude  HRTFs  on horizontal  plane  (mean  error  12.17%)  as  well  as  similar  researches.  The proposed  individualization  method  showed  that  the  individualized  magnitude HRTFs could be well estimated as the original ones with a slight error.  Thus  the eight  chosen  anthropometric  measurements  showed  their  effectiveness  in individualizing magnitude HRTFs particularly on median plane.

    A New Selection Method of Anthropometric Parameters in Individualizing HRIR

    Get PDF
    A trend issue in modeling head-related impulse responses (HRIRs) is how to individualize HRIRs models that are convenient for a particular listener. The objective of this research is to show a robust selection method of eight anthropometric parameters out of all 27 parameters defined in CIPIC HRTF Database. The proposed selection method is systematically and scientifically acceptable, compared to ‘trial and error’ method in selecting the parameters. The selected anthropometric parameters of a given listener were applied in establishing multiple linear regression models in order to individualize his / her HRIRs. We modeled the entire minimum phase HRIRs in horizontal plane of 35 subjects using principal components analysis (PCA). The individual minimum phase HRIRs can be estimated adequately by a linear combination of ten orthonormal basis functions

    Anthropometric Individualization of Head-Related Transfer Functions Analysis and Modeling

    Get PDF
    Human sound localization helps to pay attention to spatially separated speakers using interaural level and time differences as well as angle-dependent monaural spectral cues. In a monophonic teleconference, for instance, it is much more difficult to distinguish between different speakers due to missing binaural cues. Spatial positioning of the speakers by means of binaural reproduction methods using head-related transfer functions (HRTFs) enhances speech comprehension. These HRTFs are influenced by the torso, head and ear geometry as they describe the propagation path of the sound from a source to the ear canal entrance. Through this geometry-dependency, the HRTF is directional and subject-dependent. To enable a sufficient reproduction, individual HRTFs should be used. However, it is tremendously difficult to measure these HRTFs. For this reason this thesis proposes approaches to adapt the HRTFs applying individual anthropometric dimensions of a user. Since localization at low frequencies is mainly influenced by the interaural time difference, two models to adapt this difference are developed and compared with existing models. Furthermore, two approaches to adapt the spectral cues at higher frequencies are studied, improved and compared. Although the localization performance with individualized HRTFs is slightly worse than with individual HRTFs, it is nevertheless still better than with non-individual HRTFs, taking into account the measurement effort

    Spatial Audio and Individualized HRTFs using a Convolutional Neural Network (CNN)

    Full text link
    Spatial audio and 3-Dimensional sound rendering techniques play a pivotal and essential role in immersive audio experiences. Head-Related Transfer Functions (HRTFs) are acoustic filters which represent how sound interacts with an individual's unique head and ears anatomy. The use of HRTFs compliant to the subjects anatomical traits is crucial to ensure a personalized and unique spatial experience. This work proposes the implementation of an HRTF individualization method based on anthropometric features automatically extracted from ear images using a Convolutional Neural Network (CNN). Firstly, a CNN is implemented and tested to assess the performance of machine learning on positioning landmarks on ear images. The I-BUG dataset, containing ear images with corresponding 55 landmarks, was used to train and test the neural network. Subsequently, 12 relevant landmarks were selected to correspond to 7 specific anthropometric measurements established by the HUTUBS database. These landmarks serve as a reference for distance computation in pixels in order to retrieve the anthropometric measurements from the ear images. Once the 7 distances in pixels are extracted from the ear image, they are converted in centimetres using conversion factors, a best match method vector is implemented computing the Euclidean distance for each set in a database of 116 ears with their corresponding 7 anthropometric measurements provided by the HUTUBS database. The closest match of anthropometry can be identified and the corresponding set of HRTFs can be obtained for personnalized use. The method is evaluated in its validity instead of the accuracy of the results. The conceptual scope of each stage has been verified and substantiated to function correctly. The various steps and the available elements in the process are reviewed and challenged to define a greater algorithm entity designed for the desired task

    HRTF individualization using deep learning

    Get PDF
    The research presented in this paper focuses on Head-Related Transfer Function (HRTF) individualization using deep learning techniques. HRTF individualization is paramount for accurate binaural rendering, which is used in XR technologies, tools for the visually impaired, and many other applications. The rising availability of public HRTF data currently allows experimentation with different input data formats and various computational models. Accordingly, three research directions are investigated here: (1) extraction of predictors from user data; (2) unsupervised learning of HRTFs based on autoencoder networks; and (3) synthesis of HRTFs from anthropometric data using deep multilayer perceptrons and principal component analysis. While none of the aforementioned investigations has shown outstanding results to date, the knowledge acquired throughout the development and troubleshooting phases highlights areas of improvement which are expected to pave the way to more accurate models for HRTF individualization

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Optimization and improvements in spatial sound reproduction systems through perceptual considerations

    Full text link
    [ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI

    Aprendizado de variedades para a síntese de áudio espacial

    Get PDF
    Orientadores: Luiz César Martini, Bruno Sanches MasieroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O objetivo do áudio espacial gerado com a técnica binaural é simular uma fonte sonora em localizações espaciais arbitrarias através das Funções de Transferência Relativas à Cabeça (HRTFs) ou também chamadas de Funções de Transferência Anatômicas. As HRTFs modelam a interação entre uma fonte sonora e a antropometria de uma pessoa (e.g., cabeça, torso e orelhas). Se filtrarmos uma fonte de áudio através de um par de HRTFs (uma para cada orelha), o som virtual resultante parece originar-se de uma localização espacial específica. Inspirados em nossos resultados bem sucedidos construindo uma aplicação prática de reconhecimento facial voltada para pessoas com deficiência visual que usa uma interface de usuário baseada em áudio espacial, neste trabalho aprofundamos nossa pesquisa para abordar vários aspectos científicos do áudio espacial. Neste contexto, esta tese analisa como incorporar conhecimentos prévios do áudio espacial usando uma nova representação não-linear das HRTFs baseada no aprendizado de variedades para enfrentar vários desafios de amplo interesse na comunidade do áudio espacial, como a personalização de HRTFs, a interpolação de HRTFs e a melhoria da localização de fontes sonoras. O uso do aprendizado de variedades para áudio espacial baseia-se no pressuposto de que os dados (i.e., as HRTFs) situam-se em uma variedade de baixa dimensão. Esta suposição também tem sido de grande interesse entre pesquisadores em neurociência computacional, que argumentam que as variedades são cruciais para entender as relações não lineares subjacentes à percepção no cérebro. Para todas as nossas contribuições usando o aprendizado de variedades, a construção de uma única variedade entre os sujeitos através de um grafo Inter-sujeito (Inter-subject graph, ISG) revelou-se como uma poderosa representação das HRTFs capaz de incorporar conhecimento prévio destas e capturar seus fatores subjacentes. Além disso, a vantagem de construir uma única variedade usando o nosso ISG e o uso de informações de outros indivíduos para melhorar o desempenho geral das técnicas aqui propostas. Os resultados mostram que nossas técnicas baseadas no ISG superam outros métodos lineares e não-lineares nos desafios de áudio espacial abordados por esta teseAbstract: The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/14630-9FAPESPCAPE

    INDIVIDUALIZATION OF HEAD RELATED TRANSFER FUNCTION

    Get PDF
    Head related transfer functions (HRTFs) are needed to present virtual spatial sound sources via headphones. Since individually measured HRTFs are very costly and time consuming, in this paper the individualization of the dummyhead's HRTFs will be discussed. Here, the individualization is based on a scalable ellipsoidal head model. From this model the individualization is splitted into the individualization of the interaural time difference (ITD) and the spectral domain. The ellipsoidal modeling of the ITD gives quantitatively good results, considering the individual measurements. The second approach in spectral domain scales the transfer function in frequency. An angle dependent factor is calculated by the head dimensions of the subject. Afterwards, the scaling results are compared and discussed with individual measurements
    corecore