24 research outputs found
PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS
Multichannel acoustic signal processing has undergone major development
in recent years due to the increased complexity of current audio processing
applications. People want to collaborate through communication with the
feeling of being together and sharing the same environment, what is considered
as Immersive Audio Schemes. In this phenomenon, several acoustic
e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation,
sound source localization, among others. However, high computing
capacity is required to achieve any of these e ects in a real large-scale system,
what represents a considerable limitation for real-time applications.
The increase of the computational capacity has been historically linked
to the number of transistors in a chip. However, nowadays the improvements
in the computational capacity are mainly given by increasing the
number of processing units, i.e expanding parallelism in computing. This
is the case of the Graphics Processing Units (GPUs), that own now thousands
of computing cores. GPUs were traditionally related to graphic or image
applications, but new releases in the GPU programming environments,
CUDA or OpenCL, allowed that most applications were computationally
accelerated in elds beyond graphics. This thesis aims to demonstrate
that GPUs are totally valid tools to carry out audio applications that require
high computational resources. To this end, di erent applications in
the eld of audio processing are studied and performed using GPUs. This
manuscript also analyzes and solves possible limitations in each GPU-based
implementation both from the acoustic point of view as from the computational
point of view. In this document, we have addressed the following
problems:
Most of audio applications are based on massive ltering. Thus, the
rst implementation to undertake is a fundamental operation in the audio
processing: the convolution. It has been rst developed as a computational
kernel and afterwards used for an application that combines multiples convolutions
concurrently: generalized crosstalk cancellation and equalization.
The proposed implementation can successfully manage two di erent and
common situations: size of bu ers that are much larger than the size of the
lters and size of bu ers that are much smaller than the size of the lters.
Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application
deals with binaural audio. Its main feature is that this application is able
to synthesize sound sources in spatial positions that are not included in the
database of HRTF and to generate smoothly movements of sound sources.
Both features were designed after di erent tests (objective and subjective).
The performance regarding number of sound source that could be rendered
in real time was assessed on GPUs with di erent GPU architectures. A
similar performance is measured in a Wave Field Synthesis system (second
spatial audio application) that is composed of 96 loudspeakers. The proposed
GPU-based implementation is able to reduce the room e ects during
the sound source rendering.
A well-known approach for sound source localization in noisy and reverberant
environments is also addressed on a multi-GPU system. This
is the case of the Steered Response Power with Phase Transform (SRPPHAT)
algorithm. Since localization accuracy can be improved by using
high-resolution spatial grids and a high number of microphones, accurate
acoustic localization systems require high computational power. The solutions
implemented in this thesis are evaluated both from localization and
from computational performance points of view, taking into account different
acoustic environments, and always from a real-time implementation
perspective.
Finally, This manuscript addresses also massive multichannel ltering
when the lters present an In nite Impulse Response (IIR). Two cases are
analyzed in this manuscript: 1) IIR lters composed of multiple secondorder
sections, and 2) IIR lters that presents an allpass response. Both
cases are used to develop and accelerate two di erent applications: 1) to
execute multiple Equalizations in a WFS system, and 2) to reduce the
dynamic range in an audio signal.Belloch RodrÃguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale
Ambisonics
This open access book provides a concise explanation of the fundamentals and background of the surround sound recording and playback technology Ambisonics. It equips readers with the psychoacoustical, signal processing, acoustical, and mathematical knowledge needed to understand the inner workings of modern processing utilities, special equipment for recording, manipulation, and reproduction in the higher-order Ambisonic format. The book comes with various practical examples based on free software tools and open scientific data for reproducible research. The book’s introductory section offers a perspective on Ambisonics spanning from the origins of coincident recordings in the 1930s to the Ambisonic concepts of the 1970s, as well as classical ways of applying Ambisonics in first-order coincident sound scene recording and reproduction that have been practiced since the 1980s. As, from time to time, the underlying mathematics become quite involved, but should be comprehensive without sacrificing readability, the book includes an extensive mathematical appendix. The book offers readers a deeper understanding of Ambisonic technologies, and will especially benefit scientists, audio-system and audio-recording engineers. In the advanced sections of the book, fundamentals and modern techniques as higher-order Ambisonic decoding, 3D audio effects, and higher-order recording are explained. Those techniques are shown to be suitable to supply audience areas ranging from studio-sized to hundreds of listeners, or headphone-based playback, regardless whether it is live, interactive, or studio-produced 3D audio material
High Frequency Reproduction in Binaural Ambisonic Rendering
Humans can localise sounds in all directions using three main auditory cues: the differences in time and level between signals arriving at the left and right eardrums (interaural time difference and interaural level difference, respectively), and the spectral characteristics of the signals due to reflections and diffractions off the body and ears. These auditory cues can be recorded for a position in space using the head-related transfer function (HRTF), and binaural synthesis at this position can then be achieved through convolution of a sound signal with the measured HRTF. However, reproducing soundfields with multiple sources, or at multiple locations, requires a highly dense set of HRTFs. Ambisonics is a spatial audio technology that decomposes a soundfield into a weighted set of directional functions, which can be utilised binaurally in order to spatialise audio at any direction using far fewer HRTFs. A limitation of low-order Ambisonic rendering is poor high frequency reproduction, which reduces the accuracy of the resulting binaural synthesis.
This thesis presents novel HRTF pre-processing techniques, such that when using the augmented HRTFs in the binaural Ambisonic rendering stage, the high frequency reproduction is a closer approximation of direct HRTF rendering. These techniques include Ambisonic Diffuse-Field Equalisation, to improve spectral reproduction over all directions; Ambisonic Directional Bias Equalisation, to further improve spectral reproduction toward a specific direction; and Ambisonic Interaural Level Difference Optimisation, to improve lateralisation and interaural level difference reproduction. Evaluation of the presented techniques compares binaural Ambisonic rendering to direct HRTF rendering numerically, using perceptually motivated spectral difference calculations, auditory cue estimations and localisation prediction models, and perceptually, using listening tests assessing similarity and plausibility. Results conclude that the individual pre-processing techniques produce modest improvements to the high frequency reproduction of binaural Ambisonic rendering, and that using multiple pre-processing techniques can produce cumulative, and statistically significant, improvements
Proceedings of the EAA Joint Symposium on Auralization and Ambisonics 2014
In consideration of the remarkable intensity of research in the field of Virtual Acoustics, including different areas such as sound field analysis and synthesis, spatial audio technologies, and room acoustical modeling and auralization, it seemed about time to organize a second international symposium following the model of the first EAA Auralization Symposium initiated in 2009 by the acoustics group of the former Helsinki University of Technology (now Aalto University). Additionally, research communities which are focused on different approaches to sound field synthesis such as Ambisonics or Wave Field Synthesis have, in the meantime, moved closer together by using increasingly consistent theoretical frameworks. Finally, the quality of virtual acoustic environments is often considered as a result of all processing stages mentioned above, increasing the need for discussions on consistent strategies for evaluation. Thus, it seemed appropriate to integrate two of the most relevant communities, i.e. to combine the 2nd International Auralization Symposium with the 5th International Symposium on Ambisonics and Spherical Acoustics. The Symposia on Ambisonics, initiated in 2009 by the Institute of Electronic Music and Acoustics of the University of Music and Performing Arts in Graz, were traditionally dedicated to problems of spherical sound field analysis and re-synthesis, strategies for the exchange of ambisonics-encoded audio material, and – more than other conferences in this area – the artistic application of spatial audio systems.
This publication contains the official conference proceedings. It includes 29 manuscripts which have passed a 3-stage peer-review with a board of about 70 international reviewers involved in the process. Each contribution has already been published individually with a unique DOI on the DepositOnce digital repository of TU Berlin. Some conference contributions have been recommended for resubmission to Acta Acustica united with Acustica, to possibly appear in a Special Issue on Virtual Acoustics in late 2014. These are not published in this collection.European Acoustics Associatio
Desarrollo de una aplicación de audio multicanal utilizando el paralelismo de las GPU
En este trabajo se han analizado las prestaciones que ofrece una GPU ante una aplicación de audio multicanal, aplicando dicho análisis a la implementación un Cancelador Crosstalk que funciona en tiempo real y cuyo código es ejecutado sobre una GPU de un computador personal portatil.Belloch RodrÃguez, JA. (2010). Desarrollo de una aplicación de audio multicanal utilizando el paralelismo de las GPU. http://hdl.handle.net/10251/13644Archivo delegad