717 research outputs found

    Spatial Sound Rendering – A Survey

    Get PDF
    Simulating propagation of sound and audio rendering can improve the sense of realism and the immersion both in complex acoustic environments and dynamic virtual scenes. In studies of sound auralization, the focus has always been on room acoustics modeling, but most of the same methods are also applicable in the construction of virtual environments such as those developed to facilitate computer gaming, cognitive research, and simulated training scenarios. This paper is a review of state-of-the-art techniques that are based on acoustic principles that apply not only to real rooms but also in 3D virtual environments. The paper also highlights the need to expand the field of immersive sound in a web based browsing environment, because, despite the interest and many benefits, few developments seem to have taken place within this context. Moreover, the paper includes a list of the most effective algorithms used for modelling spatial sound propagation and reports their advantages and disadvantages. Finally, the paper emphasizes in the evaluation of these proposed works

    Advanced automatic mixing tools for music

    Get PDF
    PhDThis thesis presents research on several independent systems that when combined together can generate an automatic sound mix out of an unknown set of multi‐channel inputs. The research explores the possibility of reproducing the mixing decisions of a skilled audio engineer with minimal or no human interaction. The research is restricted to non‐time varying mixes for large room acoustics. This research has applications in dynamic sound music concerts, remote mixing, recording and postproduction as well as live mixing for interactive scenes. Currently, automated mixers are capable of saving a set of static mix scenes that can be loaded for later use, but they lack the ability to adapt to a different room or to a different set of inputs. In other words, they lack the ability to automatically make mixing decisions. The automatic mixer research depicted here distinguishes between the engineering mixing and the subjective mixing contributions. This research aims to automate the technical tasks related to audio mixing while freeing the audio engineer to perform the fine‐tuning involved in generating an aesthetically‐pleasing sound mix. Although the system mainly deals with the technical constraints involved in generating an audio mix, the developed system takes advantage of common practices performed by sound engineers whenever possible. The system also makes use of inter‐dependent channel information for controlling signal processing tasks while aiming to maintain system stability at all times. A working implementation of the system is described and subjective evaluation between a human mix and the automatic mix is used to measure the success of the automatic mixing tools

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    Efficient Algorithms for Immersive Audio Rendering Enhancement

    Get PDF
    Il rendering audio immersivo è il processo di creazione di un’esperienza sonora coinvolgente e realistica nello spazio 3D. Nei sistemi audio immersivi, le funzioni di trasferimento relative alla testa (head-related transfer functions, HRTFs) vengono utilizzate per la sintesi binaurale in cuffia poiché esprimono il modo in cui gli esseri umani localizzano una sorgente sonora. Possono essere introdotti algoritmi di interpolazione delle HRTF per ridurre il numero di punti di misura e per creare un movimento del suono affidabile. La riproduzione binaurale può essere eseguita anche dagli altoparlanti. Tuttavia, il coinvolgimento di due o più gli altoparlanti causa il problema del crosstalk. In questo caso, algoritmi di cancellazione del crosstalk (CTC) sono necessari per eliminare i segnali di interferenza indesiderati. In questa tesi, partendo da un'analisi comparativa di metodi di misura delle HRTF, viene proposto un sistema di rendering binaurale basato sull'interpolazione delle HRTF per applicazioni in tempo reale. Il metodo proposto mostra buone prestazioni rispetto a una tecnica di riferimento. L'algoritmo di interpolazione è anche applicato al rendering audio immersivo tramite altoparlanti, aggiungendo un algoritmo di cancellazione del crosstalk fisso, che considera l'ascoltatore in una posizione fissa. Inoltre, un sistema di cancellazione crosstalk adattivo, che include il tracciamento della testa dell'ascoltatore, è analizzato e implementato in tempo reale. Il CTC adattivo implementa una struttura in sottobande e risultati sperimentali dimostrano che un maggiore numero di bande migliora le prestazioni in termini di errore totale e tasso di convergenza. Il sistema di riproduzione e le caratteristiche dell'ambiente di ascolto possono influenzare le prestazioni a causa della loro risposta in frequenza non ideale. L'equalizzazione viene utilizzata per livellare le varie parti dello spettro di frequenze che compongono un segnale audio al fine di ottenere le caratteristiche sonore desiderate. L'equalizzazione può essere manuale, come nel caso dell'equalizzazione grafica, dove il guadagno di ogni banda di frequenza può essere modificato dall'utente, o automatica, la curva di equalizzazione è calcolata automaticamente dopo la misurazione della risposta impulsiva della stanza. L'equalizzazione della risposta ambientale può essere applicata anche ai sistemi multicanale, che utilizzano due o più altoparlanti e la zona di equalizzazione può essere ampliata misurando le risposte impulsive in diversi punti della zona di ascolto. In questa tesi, GEQ efficienti e un sistema adattativo di equalizzazione d'ambiente. In particolare, sono proposti e approfonditi tre equalizzatori grafici a basso costo computazionale e a fase lineare e quasi lineare. Gli esperimenti confermano l'efficacia degli equalizzatori proposti in termini di accuratezza, complessità computazionale e latenza. Successivamente, una struttura adattativa in sottobande è introdotta per lo sviluppo di un sistema di equalizzazione d'ambiente multicanale. I risultati sperimentali verificano l'efficienza dell'approccio in sottobande rispetto al caso a banda singola. Infine, viene presentata una rete crossover a fase lineare per sistemi multicanale, mostrando ottimi risultati in termini di risposta in ampiezza, bande di transizione, risposta polare e risposta in fase. I sistemi di controllo attivo del rumore (ANC) possono essere progettati per ridurre gli effetti dell'inquinamento acustico e possono essere utilizzati contemporaneamente a un sistema audio immersivo. L'ANC funziona creando un'onda sonora in opposizione di fase rispetto all'onda sonora in arrivo. Il livello sonoro complessivo viene così ridotto grazie all'interferenza distruttiva. Infine, questa tesi presenta un sistema ANC utilizzato per la riduzione del rumore. L’approccio proposto implementa una stima online del percorso secondario e si basa su filtri adattativi in sottobande applicati alla stima del percorso primario che mirano a migliorare le prestazioni dell’intero sistema. La struttura proposta garantisce un tasso di convergenza migliore rispetto all'algoritmo di riferimento.Immersive audio rendering is the process of creating an engaging and realistic sound experience in 3D space. In immersive audio systems, the head-related transfer functions (HRTFs) are used for binaural synthesis over headphones since they express how humans localize a sound source. HRTF interpolation algorithms can be introduced for reducing the number of measurement points and creating a reliable sound movement. Binaural reproduction can be also performed by loudspeakers. However, the involvement of two or more loudspeakers causes the problem of crosstalk. In this case, crosstalk cancellation (CTC) algorithms are needed to delete unwanted interference signals. In this thesis, starting from a comparative analysis of HRTF measurement techniques, a binaural rendering system based on HRTF interpolation is proposed and evaluated for real-time applications. The proposed method shows good performance in comparison with a reference technique. The interpolation algorithm is also applied for immersive audio rendering over loudspeakers, by adding a fixed crosstalk cancellation algorithm, which assumes that the listener is in a fixed position. In addition, an adaptive crosstalk cancellation system, which includes the tracking of the listener's head, is analyzed and a real-time implementation is presented. The adaptive CTC implements a subband structure and experimental results prove that a higher number of bands improves the performance in terms of total error and convergence rate. The reproduction system and the characteristics of the listening room may affect the performance due to their non-ideal frequency response. Audio equalization is used to adjust the balance of different audio frequencies in order to achieve desired sound characteristics. The equalization can be manual, such as in the case of graphic equalization, where the gain of each frequency band can be modified by the user, or automatic, where the equalization curve is automatically calculated after the room impulse response measurement. The room response equalization can be also applied to multichannel systems, which employ two or more loudspeakers, and the equalization zone can be enlarged by measuring the impulse responses in different points of the listening zone. In this thesis, efficient graphic equalizers (GEQs), and an adaptive room response equalization system are presented. In particular, three low-complexity linear- and quasi-linear-phase graphic equalizers are proposed and deeply examined. Experiments confirm the effectiveness of the proposed GEQs in terms of accuracy, computational complexity, and latency. Successively, a subband adaptive structure is introduced for the development of a multichannel and multiple positions room response equalizer. Experimental results verify the effectiveness of the subband approach in comparison with the single-band case. Finally, a linear-phase crossover network is presented for multichannel systems, showing great results in terms of magnitude flatness, cutoff rates, polar diagram, and phase response. Active noise control (ANC) systems can be designed to reduce the effects of noise pollution and can be used simultaneously with an immersive audio system. The ANC works by creating a sound wave that has an opposite phase with respect to the sound wave of the unwanted noise. The additional sound wave creates destructive interference, which reduces the overall sound level. Finally, this thesis presents an ANC system used for noise reduction. The proposed approach implements an online secondary path estimation and is based on cross-update adaptive filters applied to the primary path estimation that aim at improving the performance of the whole system. The proposed structure allows for a better convergence rate in comparison with a reference algorithm

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc

    Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany

    Get PDF
    The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities. The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT
    • …
    corecore