652 research outputs found
On the plausibility of simplified acoustic room representations for listener translation in dynamic binaural auralizations
Diese Doktorarbeit untersucht die Wahrnehmung vereinfachter akustischer Raumrepräsentationen in positionsdynamischer Binauralwiedergabe für die Hörertranslation. Die dynamische Binauralsynthese ist eine Audiowiedergabemethode zur Erzeugung räumlicher auditiver Illusionen über Kopfhörer für virtuelle, erweiterte und gemischte Realität (VR/AR/MR). Dabei ist es nun eine typische Anforderung, immersive Inhalte in sechs Freiheitsgraden (6DOF) zu erkunden. Dynamische binaurale Schallfeldimitationen mit hoher physikalischer Genauigkeit zu realisieren, ist meist mit sehr hohem Rechenaufwand verbunden. Frühere psychoakustische Studien weisen jedoch darauf hin, dass Menschen eine begrenzte Empfindlichkeit gegenüber den Details des Schallfelds haben, insbesondere im späten Nachhall. Dies birgt das Potential physikalischer Vereinfachungen bei der positionsdynamischen Auralisation von Räumen. Beispielsweise wurden Konzepte vorgeschlagen, die auf der perzeptiven Mixing Time oder der Hörbarkeitsschwelle von frühen Reflexionen basieren, für welche jedoch eine gründliche psychoakustische Bewertung noch aussteht. Zunächst wurde ein Aufbau zur positionsdynamischen Raumauralisation implementiert und evaluiert. Daran untersucht die Arbeit wesentliche Systemparameter wie die erforderliche räumliche Auflösung eines Positionsrasters für die dynamische Anpassung. Da allgemein etablierte Testmethoden zur wahrnehmungsbezogenen Bewertung von räumlichen auditiven Illusionen unter Berücksichtigung interaktiver Hörertranslation fehlten, untersucht die Arbeit verschiedene Ansätze zur Messung der Plausibilität. Auf dieser Grundlage werden physikalische Vereinfachungen im Verlauf des Schallfeldes in positionsdynamischen binauralen Auralisationen der Raumakustik untersucht. Für die Hauptexperimente wurden binaurale Raumimpulsantworten (BRIRs) entlang einer Linie für die Hörertranslation in einem eher trockenen Hörlabor und einem halligen Seminarraum ähnlicher Größe gemessen. Die erstellten Datensätze enthalten Szenarien von Hörerbewegungen auf eine virtuelle Schallquelle zu, daran vorbei, davon weg oder dahinter. Darüber hinaus betrachten die Untersuchungen zwei Extremfälle der Quellenorientierung, um die Auswirkungen einer Variation der Schallquellenrichtcharakteristik zu berücksichtigen. Die BRIR-Sätze werden systematisch bearbeitet und vereinfacht, um die Auswirkungen auf die Wahrnehmung zu bewerten. Insbesondere das Konzept der perzeptiven Mixing Time und manipulierte räumlich-zeitliche Muster früher Reflexionen dienten als Testfälle in den psychoakustischen Studien. Die Ergebnisse zeigen ein hohes Potential für Vereinfachungen, unterstreichen aber auch die Relevanz der genauen Imitation prominenter früher Reflexionen. Die Ergebnisse bestätigen auch das Konzept der wahrnehmungsbezogenen Mixing Time für die betrachteten Fälle der positionsdynamischen binauralen Wiedergabe. Die Beobachtungen verdeutlichen, dass gängige Testszenarien für Auralisierungen, Interpolation und Extrapolation nicht kritisch genug sind, um allgemeine Schlussfolgerungen über die Eignung der getesteten Rendering-Ansätze zu ziehen. Die Arbeit zeigt Lösungsansätze auf.This thesis investigates the effect of simplified acoustic room representations in position-dynamic binaural audio for listener translation. Dynamic binaural synthesis is an audio reproduction method to create spatial auditory illusions over headphones for virtual, augmented, and mixed reality (AR/VR/MR). It has become a typical demand to explore immersive content in six degrees of freedom (6DOF). Realizing dynamic binaural sound field imitations with high physical accuracy requires high computational effort. However, previous psychoacoustic research indicates that humans have limited sensitivity to the details of the sound field. This fact bears the potential to simplify the physics in position-dynamic room auralizations. For example, concepts based on the perceptual mixing time or the audibility threshold of early reflections have been proposed. This thesis investigates the effect of simplified acoustic room representations in position-dynamic binaural audio for listener translation. First, a setup for position dynamic binaural room auralization was implemented and evaluated. Essential system parameters like the required position grid resolution for the audio reproduction were examined. Due to the lack of generally established test methods for the perceptual evaluation of spatial auditory illusions considering interactive listener translation, this thesis explores different approaches for measuring plausibility. Based on this foundation, this work examines physical impairments and simplifications in the progress of the sound field in position dynamic binaural auralizations of room acoustics. For the main experiments, sets of binaural room impulse responses (BRIRs) were measured along a line for listener translation in a relatively dry listening laboratory and a reverberant seminar room of similar size. These sets include scenarios of walking towards a virtual sound source, past it, away from it, or behind it. The consideration of two extreme cases of source orientation took into account the effects of variations in directivity. The BRIR sets were systematically impaired and simplified to evaluate the perceptual effects. Especially the concept of the perceptual mixing time and manipulated spatiotemporal patterns of early reflections served as test cases. The results reveal a high potential for simplification but also underline the relevance of accurately imitating prominent early reflections. The findings confirm the concept of the perceptual mixing time for the considered cases of position-dynamic binaural audio. The observations highlight that common test scenarios for dynamic binaural rendering approaches are not sufficiently critical to draw general conclusions about their suitability. This thesis proposes strategies to solve this
Measurement of head-related transfer functions : A review
A head-related transfer function (HRTF) describes an acoustic transfer function between a point sound source in the free-field and a defined position in the listener's ear canal, and plays an essential role in creating immersive virtual acoustic environments (VAEs) reproduced over headphones or loudspeakers. HRTFs are highly individual, and depend on directions and distances (near-field HRTFs). However, the measurement of high-density HRTF datasets is usually time-consuming, especially for human subjects. Over the years, various novel measurement setups and methods have been proposed for the fast acquisition of individual HRTFs while maintaining high measurement accuracy. This review paper provides an overview of various HRTF measurement systems and some insights into trends in individual HRTF measurements
PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS
Multichannel acoustic signal processing has undergone major development
in recent years due to the increased complexity of current audio processing
applications. People want to collaborate through communication with the
feeling of being together and sharing the same environment, what is considered
as Immersive Audio Schemes. In this phenomenon, several acoustic
e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation,
sound source localization, among others. However, high computing
capacity is required to achieve any of these e ects in a real large-scale system,
what represents a considerable limitation for real-time applications.
The increase of the computational capacity has been historically linked
to the number of transistors in a chip. However, nowadays the improvements
in the computational capacity are mainly given by increasing the
number of processing units, i.e expanding parallelism in computing. This
is the case of the Graphics Processing Units (GPUs), that own now thousands
of computing cores. GPUs were traditionally related to graphic or image
applications, but new releases in the GPU programming environments,
CUDA or OpenCL, allowed that most applications were computationally
accelerated in elds beyond graphics. This thesis aims to demonstrate
that GPUs are totally valid tools to carry out audio applications that require
high computational resources. To this end, di erent applications in
the eld of audio processing are studied and performed using GPUs. This
manuscript also analyzes and solves possible limitations in each GPU-based
implementation both from the acoustic point of view as from the computational
point of view. In this document, we have addressed the following
problems:
Most of audio applications are based on massive ltering. Thus, the
rst implementation to undertake is a fundamental operation in the audio
processing: the convolution. It has been rst developed as a computational
kernel and afterwards used for an application that combines multiples convolutions
concurrently: generalized crosstalk cancellation and equalization.
The proposed implementation can successfully manage two di erent and
common situations: size of bu ers that are much larger than the size of the
lters and size of bu ers that are much smaller than the size of the lters.
Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application
deals with binaural audio. Its main feature is that this application is able
to synthesize sound sources in spatial positions that are not included in the
database of HRTF and to generate smoothly movements of sound sources.
Both features were designed after di erent tests (objective and subjective).
The performance regarding number of sound source that could be rendered
in real time was assessed on GPUs with di erent GPU architectures. A
similar performance is measured in a Wave Field Synthesis system (second
spatial audio application) that is composed of 96 loudspeakers. The proposed
GPU-based implementation is able to reduce the room e ects during
the sound source rendering.
A well-known approach for sound source localization in noisy and reverberant
environments is also addressed on a multi-GPU system. This
is the case of the Steered Response Power with Phase Transform (SRPPHAT)
algorithm. Since localization accuracy can be improved by using
high-resolution spatial grids and a high number of microphones, accurate
acoustic localization systems require high computational power. The solutions
implemented in this thesis are evaluated both from localization and
from computational performance points of view, taking into account different
acoustic environments, and always from a real-time implementation
perspective.
Finally, This manuscript addresses also massive multichannel ltering
when the lters present an In nite Impulse Response (IIR). Two cases are
analyzed in this manuscript: 1) IIR lters composed of multiple secondorder
sections, and 2) IIR lters that presents an allpass response. Both
cases are used to develop and accelerate two di erent applications: 1) to
execute multiple Equalizations in a WFS system, and 2) to reduce the
dynamic range in an audio signal.Belloch RodrĂguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale
- …