124 research outputs found

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Aprendizado de variedades para a síntese de áudio espacial

    Get PDF
    Orientadores: Luiz César Martini, Bruno Sanches MasieroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O objetivo do áudio espacial gerado com a técnica binaural é simular uma fonte sonora em localizações espaciais arbitrarias através das Funções de Transferência Relativas à Cabeça (HRTFs) ou também chamadas de Funções de Transferência Anatômicas. As HRTFs modelam a interação entre uma fonte sonora e a antropometria de uma pessoa (e.g., cabeça, torso e orelhas). Se filtrarmos uma fonte de áudio através de um par de HRTFs (uma para cada orelha), o som virtual resultante parece originar-se de uma localização espacial específica. Inspirados em nossos resultados bem sucedidos construindo uma aplicação prática de reconhecimento facial voltada para pessoas com deficiência visual que usa uma interface de usuário baseada em áudio espacial, neste trabalho aprofundamos nossa pesquisa para abordar vários aspectos científicos do áudio espacial. Neste contexto, esta tese analisa como incorporar conhecimentos prévios do áudio espacial usando uma nova representação não-linear das HRTFs baseada no aprendizado de variedades para enfrentar vários desafios de amplo interesse na comunidade do áudio espacial, como a personalização de HRTFs, a interpolação de HRTFs e a melhoria da localização de fontes sonoras. O uso do aprendizado de variedades para áudio espacial baseia-se no pressuposto de que os dados (i.e., as HRTFs) situam-se em uma variedade de baixa dimensão. Esta suposição também tem sido de grande interesse entre pesquisadores em neurociência computacional, que argumentam que as variedades são cruciais para entender as relações não lineares subjacentes à percepção no cérebro. Para todas as nossas contribuições usando o aprendizado de variedades, a construção de uma única variedade entre os sujeitos através de um grafo Inter-sujeito (Inter-subject graph, ISG) revelou-se como uma poderosa representação das HRTFs capaz de incorporar conhecimento prévio destas e capturar seus fatores subjacentes. Além disso, a vantagem de construir uma única variedade usando o nosso ISG e o uso de informações de outros indivíduos para melhorar o desempenho geral das técnicas aqui propostas. Os resultados mostram que nossas técnicas baseadas no ISG superam outros métodos lineares e não-lineares nos desafios de áudio espacial abordados por esta teseAbstract: The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/14630-9FAPESPCAPE

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Effizientes binaurales Rendering von virtuellen akustischen Realitäten : technische und wahrnehmungsbezogene Konzepte

    Get PDF
    Binaural rendering aims to immerse the listener in a virtual acoustic scene, making it an essential method for spatial audio reproduction in virtual or augmented reality (VR/AR) applications. The growing interest and research in VR/AR solutions yielded many different methods for the binaural rendering of virtual acoustic realities, yet all of them share the fundamental idea that the auditory experience of any sound field can be reproduced by reconstructing its sound pressure at the listener's eardrums. This thesis addresses various state-of-the-art methods for 3 or 6 degrees of freedom (DoF) binaural rendering, technical approaches applied in the context of headphone-based virtual acoustic realities, and recent technical and psychoacoustic research questions in the field of binaural technology. The publications collected in this dissertation focus on technical or perceptual concepts and methods for efficient binaural rendering, which has become increasingly important in research and development due to the rising popularity of mobile consumer VR/AR devices and applications. The thesis is organized into five research topics: Head-Related Transfer Function Processing and Interpolation, Parametric Spatial Audio, Auditory Distance Perception of Nearby Sound Sources, Binaural Rendering of Spherical Microphone Array Data, and Voice Directivity. The results of the studies included in this dissertation extend the current state of research in the respective research topic, answer specific psychoacoustic research questions and thereby yield a better understanding of basic spatial hearing processes, and provide concepts, methods, and design parameters for the future implementation of technically and perceptually efficient binaural rendering.Binaurales Rendering zielt darauf ab, dass der Hörer in eine virtuelle akustische Szene eintaucht, und ist somit eine wesentliche Methode für die räumliche Audiowiedergabe in Anwendungen der virtuellen Realität (VR) oder der erweiterten Realität (AR – aus dem Englischen Augmented Reality). Das wachsende Interesse und die zunehmende Forschung an VR/AR-Lösungen führte zu vielen verschiedenen Methoden für das binaurale Rendering virtueller akustischer Realitäten, die jedoch alle die grundlegende Idee teilen, dass das Hörerlebnis eines beliebigen Schallfeldes durch die Rekonstruktion seines Schalldrucks am Trommelfell des Hörers reproduziert werden kann. Diese Arbeit befasst sich mit verschiedenen modernsten Methoden zur binauralen Wiedergabe mit 3 oder 6 Freiheitsgraden (DoF – aus dem Englischen Degree of Freedom), mit technischen Ansätzen, die im Kontext kopfhörerbasierter virtueller akustischer Realitäten angewandt werden, und mit aktuellen technischen und psychoakustischen Forschungsfragen auf dem Gebiet der Binauraltechnik. Die in dieser Dissertation gesammelten Publikationen befassen sich mit technischen oder wahrnehmungsbezogenen Konzepten und Methoden für effizientes binaurales Rendering, was in der Forschung und Entwicklung aufgrund der zunehmenden Beliebtheit von mobilen Verbraucher-VR/AR-Geräten und -Anwendungen zunehmend an Relevanz gewonnen hat. Die Arbeit ist in fünf Forschungsthemen gegliedert: Verarbeitung und Interpolation von Außenohrübertragungsfunktionen, parametrisches räumliches Audio, auditive Entfernungswahrnehmung ohrnaher Schallquellen, binaurales Rendering von sphärischen Mikrofonarraydaten und Richtcharakteristik der Stimme. Die Ergebnisse der in dieser Dissertation enthaltenen Studien erweitern den aktuellen Forschungsstand im jeweiligen Forschungsfeld, beantworten spezifische psychoakustische Forschungsfragen und führen damit zu einem besseren Verständnis grundlegender räumlicher Hörprozesse, und liefern Konzepte, Methoden und Gestaltungsparameter für die zukünftige Umsetzung eines technisch und wahrnehmungsbezogen effizienten binauralen Renderings.BMBF, 03FH014IX5, Natürliche raumbezogene Darbietung selbsterzeugter Schallereignisse in virtuellen auditiven Umgebungen (NarDasS

    Evaluating the Perceived Quality of Binaural Technology

    Get PDF
    This thesis studies binaural sound reproduction from both a technical and a perceptual perspective, with the aim of improving the headphone listening experience for entertainment media audiences. A detailed review is presented of the relevant binaural technology and of the concepts and methods for evaluating perceived quality. A pilot study assesses the application of state-of-the-art binaural rendering systems to existing broadcast programmes, finding no substantial improvements in quality over conventional stereo signals. A second study gives evidence that realistic binaural simulation can be achieved without personalised acoustic calibration, showing promise for the application of binaural technology. Flexible technical apparatus is presented to allow further investigation of rendering techniques and content production processes. Two web-based studies show that appropriate combination of techniques can lead to improved experience for typical audience members, compared to stereo signals, even without personalised rendering or listener head-tracking. Recent developments in spatial audio applications are then discussed. These have made dynamic client-side binaural rendering with listener head-tracking feasible for mass audiences, but also present technical constraints. To limit distribution bandwidth and computational complexity during rendering, loudspeaker virtualisation is widely used. The effects on perceived quality of these techniques are studied in depth for the first time. A descriptive analysis experiment demonstrates that loudspeaker virtualisation during binaural rendering causes degradations to a range of perceptual characteristics and that these vary across other system conditions. A final experiment makes novel use of the check-all-that-apply method to efficiently characterise the quality of seven spatial audio representations and associated dynamic binaural rendering techniques, using single sound sources and complex dramatic scenes. The perceived quality of these different representations varies significantly across a wide range of characteristics and with programme material. These methods and findings can be used to improve the quality of current binaural technology applications

    Fast Numerical and Machine Learning Algorithms for Spatial Audio Reproduction

    Get PDF
    Audio reproduction technologies have underwent several revolutions from a purely mechanical, to electromagnetic, and into a digital process. These changes have resulted in steady improvements in the objective qualities of sound capture/playback on increasingly portable devices. However, most mobile playback devices remove important spatial-directional components of externalized sound which are natural to the subjective experience of human hearing. Fortunately, the missing spatial-directional parts can be integrated back into audio through a combination of computational methods and physical knowledge of how sound scatters off of the listener's anthropometry in the sound-field. The former employs signal processing techniques for rendering the sound-field. The latter employs approximations of the sound-field through the measurement of so-called Head-Related Impulse Responses/Transfer Functions (HRIRs/HRTFs). This dissertation develops several numerical and machine learning algorithms for accelerating and personalizing spatial audio reproduction in light of available mobile computing power. First, spatial audio synthesis between a sound-source and sound-field requires fast convolution algorithms between the audio-stream and the HRIRs. We introduce a novel sparse decomposition algorithm for HRIRs based on non-negative matrix factorization that allows for faster time-domain convolution than frequency-domain fast-Fourier-transform variants. Second, the full sound-field over the spherical coordinate domain must be efficiently approximated from a finite collection of HRTFs. We develop a joint spatial-frequency covariance model for Gaussian process regression (GPR) and sparse-GPR methods that supports the fast interpolation and data fusion of HRTFs across multiple data-sets. Third, the direct measurement of HRTFs requires specialized equipment that is unsuited for widespread acquisition. We ``bootstrap'' the human ability to localize sound in listening tests with Gaussian process active-learning techniques over graphical user interfaces that allows the listener to infer his/her own HRTFs. Experiments are conducted on publicly available HRTF datasets and human listeners

    Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

    Get PDF
    With the increasing popularity of spatial audio content streaming and interactive binaural audio rendering, it is pertinent to study the quality of the critical components of such systems. This includes low-bitrate compression of Ambisonic scenes and binaural rendering schemes. This thesis presents a group of perceptual experiments focusing on these two elements of the Ambisonic delivery chain. The first group of experiments focused on the quality of low-bitrate compression of Ambisonics. The first study evaluated the perceived timbral quality degradation introduced by the Opus audio codec at different bitrate settings and Ambisonic orders. This experiment was conducted using multi-loudspeaker reproduction as well as binaural rendering. The second study has been dedicated to auditory localisation performance in bitrate-compressed Ambisonic scenes reproduced over loudspeakers and binaurally using generic and individually measured HRTF sets. Finally, the third study extended the evaluated set of codec parameters by testing different channel mappings and various audio stimuli contexts. This study was conducted in VR thanks to a purposely developed listening test framework. The comprehensive evaluation of the Opus codec led to a set of recommendations regarding optimal codec parameters. The second group of experiments focused on the evaluation of different methods for binaural rendering of Ambisonics. The first study in this group focused on the implementation of the established methods for designing Ambisonic-to-binaural filters and subsequent objective and subjective evaluations of these. The second study explored the concept of hybrid binaural rendering combining anechoic filters with reverberant ones. Finally, addressing the problem of non-individual HRTFs used for spatial audio rendering, an XR-based method for acquiring individual HRTFs using a single loudspeaker has been proposed. The conducted perceptual evaluations identified key areas where the Ambisonic delivery chain could be improved to provide a more satisfactory user experience

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    3-D audio using loudspeakers

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1997.Includes bibliographical references (p. 145-153).by William G. Gardner.Ph.D
    corecore