325 research outputs found

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    Audio-visual Virtual Reality System for Room Acoustics

    Get PDF
    We present an audio-visual Virtual Reality display system for simulated sound fields. In addition to the room acoustic simulation by means of phonon tracing and finite element method this system includes the stereoscopic visualization of simulation results using a 3D back projection system as well as auralization by use of a professional sound equipment. For auralization purposes we develop a sound field synthesis approach for accurate control of the loudspeaker system

    Low Frequency Simulations for Ambisonics Auralization of a Car Sound System

    Get PDF
    In this paper, a technique is described for obtaining the High Order Ambisonics (HOA) Impulse Responses (IRs) of an automotive infotainment system, relying on Finite Elements Method (FEM) simulations performed in COMSOL Multiphysics. The resulting HOA IRs are employed for auralizing the car sound system, either inside an Ambisonics listening room with a loudspeaker rig or with binaural rendering on a Head Mounted Display (HMD), benefiting from head-tracking and personalized Head Related Transfer Functions (HRTFs). This allows performing subjective tests before the prototype is built and preserving the auditory experience with a degree of realism unattainable with the static binaural approach. Measurements performed in a prototype vehicle with a spherical microphone array are compared to FEM simulations. A good agreement between numerical and experimental methods have been demonstrated

    Investigating the interaction between positions and signals of height-channel loudspeakers in reproducing immersive 3d sound

    Get PDF
    Since transmission capacities have significantly increased over the past few years, researchers are now able to transmit a larger amount of data, namely multichannel audio content, in the consumer applications. What has not been investigated in a systematic way yet is how to deliver the multichannel content. Specifically, researchers\u27 attention is focused on the quest of a standardized immersive reproduction format that incorporates height loudspeakers coupled with the new high-resolution and three-dimensional (3D) media content for a comprehensive 3D experience. To better understand and utilize the immersive audio reproduction, this research focused on the (1) interaction between the positioning of height loudspeakers and the signals fed to the loudspeakers, (2) investigation of the perceptual characteristics associated with the height ambiences, and (3) the influence of inverse filtering on perceived sound quality for the realistic 3D sound reproduction. The experiment utilized the existence of two layers of loudspeakers: horizontal layer following the ITU-R BS.775 five-channel loudspeaker configuration and height layer locating a total of twelve loudspeakers at the azimuth of ±30°, ±50°, ±70°, ±90°, ±110° and ±130° and elevation of 30°. Eight configurations were formed, each of which selected four height-loudspeakers from twelve. In the subjective evaluation, listeners compared, ranked and described the eight randomly presented configurations of 4-channel height ambiences. The stimuli for the experiment were four nine-channel (5 channels for the horizontal and 4 for the height loudspeakers) multichannel music. Moreover, an approach of Finite Impulse Response (FIR) inverse filtering was attempted, in order to remove the particular room\u27s acoustic influence. Another set of trained professionals was informally asked to use descriptors to characterize the newly presented multichannel music with height ambiences rendered with inverse filtering. The experimental results indicate the significance of the positioning of the loudspeakers with respect to the signals being fed to those loudspeakers in delivering a 3D sound field. Furthermore, it has been revealed that the perceptual characteristics that listeners linked for multichannel music with height ambiences include envelopment, elevated-ness and fullness. Last but not least, after applying the inverse filtering the subjective preference was not affected significantly. This allows for the author to believe that, in fact, the room\u27s influence with respect to the subjective evaluation is not as important as the appropriate loudspeaker-positioning for the multichannel-reproduced music with height ambiences

    A minimax approach for the joint design of acoustic crosstalk cancellation filters

    Get PDF
    Journal ArticleAbstract-This paper presents a method for jointly designing immersive audio rendering filters for a single listener using loudspeakers. The filters for crosstalk cancellation are assumed to have finite impulse responses and are designed using the minimax criterion. In addition to the traditional Atal-Schroeder crosstalk canceler structure, this paper explores an alternate topology that requires the approximation of a single filter. In general, the minimax approach provides improved low-frequency performance leading to a better overall separation of the direct-path and cross-path transfer functions than least-squares designs. The performance of the single-filter structure is better than that of the traditional crosstalk cancellation structure

    Acoustic heritage and audio creativity: the creative application of sound in the representation, understanding and experience of past environments

    Get PDF
    Acoustic Heritage is one aspect of archaeoacoustics, and refers more specifically to the quantifiable acoustic properties of buildings, sites and landscapes from our architectural and archaeological past, forming an important aspect of our intangible cultural heritage. Auralisation, the audio equivalent of 3D visualization, enables these acoustic properties, captured via the process of measurement and survey, or computer based modelling, to form the basis of an audio reconstruction and presentation of the studied space. This paper examines the application of auralisation and audio creativity as a means to explore our acoustic heritage, thereby diversifying and enhancing the toolset available to the digital heritage or humanities researcher. The Open Acoustic Impulse Response (OpenAIR) library is an online repository for acoustic impulse response and auralisation data, with a significant part having been gathered from a broad range of heritage sites. The methodology used to gather this acoustic data is discussed, together with the processes used in generating and calibrating a comparable computer model, and how the data generated might be analysed and presented. The creative use of this acoustic data is also considered, in the context of music production, mixed media artwork and audio for gaming. More specifically to digital heritage is how these data can be used to create new experiences of past environments, as information, interpretation, guide or artwork and ultimately help to articulate new research questions and explorations of our acoustic heritage
    • …
    corecore