1,556 research outputs found
Spatial sound for computer games and virtual reality
In this chapter, we discuss spatial sound within the context of Virtual Reality and other synthetic environments such as computer games. We review current audio technologies, sound constraints within immersive multi-modal spaces, and future trends. The review process takes into consideration the wide-varying levels of audio sophistication in the gaming and VR industries, ranging from standard stereo output to Head Related Transfer Function implementation. The level of sophistication is determined mostly by hardware/system constraints (such as mobile devices or network limitations), however audio practitioners are developing novel and diverse methods to overcome many of these challenges. No matter what approach is employed, the primary objectives are very similar—the enhancement of the virtual scene and the enrichment of the user experience. We discuss how successful various audio technologies are in achieving these objectives, how they fall short, and how they are aligned to overcome these shortfalls in future implementations
Spatial auditory display for acoustics and music collections
PhDThis thesis explores how audio can be better incorporated into how people access
information and does so by developing approaches for creating three-dimensional audio
environments with low processing demands. This is done by investigating three research
questions.
Mobile applications have processor and memory requirements that restrict the
number of concurrent static or moving sound sources that can be rendered with binaural
audio. Is there a more e cient approach that is as perceptually accurate as the traditional
method? This thesis concludes that virtual Ambisonics is an ef cient and accurate means
to render a binaural auditory display consisting of noise signals placed on the horizontal
plane without head tracking. Virtual Ambisonics is then more e cient than convolution
of HRTFs if more than two sound sources are concurrently rendered or if movement of
the sources or head tracking is implemented.
Complex acoustics models require signi cant amounts of memory and processing. If
the memory and processor loads for a model are too large for a particular device, that
model cannot be interactive in real-time. What steps can be taken to allow a complex
room model to be interactive by using less memory and decreasing the computational
load? This thesis presents a new reverberation model based on hybrid reverberation
which uses a collection of B-format IRs. A new metric for determining the mixing
time of a room is developed and interpolation between early re
ections is investigated.
Though hybrid reverberation typically uses a recursive lter such as a FDN for the late
reverberation, an average late reverberation tail is instead synthesised for convolution
reverberation.
Commercial interfaces for music search and discovery use little aural information
even though the information being sought is audio. How can audio be used in
interfaces for music search and discovery? This thesis looks at 20 interfaces and
determines that several themes emerge from past interfaces. These include using a two
or three-dimensional space to explore a music collection, allowing concurrent playback of
multiple sources, and tools such as auras to control how much information is presented. A
new interface, the amblr, is developed because virtual two-dimensional spaces populated
by music have been a common approach, but not yet a perfected one. The amblr is also
interpreted as an art installation which was visited by approximately 1000 people over 5
days. The installation maps the virtual space created by the amblr to a physical space
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Object-based reverberation for spatial audio
Object-based audio is gaining momentum as a means for future audio content to be more immersive, interactive, and accessible. Recent standardization developments make recommendations for object formats; however, the capture, production, and reproduction of reverberation is an open issue. In this paper parametric approaches for capturing, representing, editing, and rendering reverberation over a 3D spatial audio system are reviewed. A framework is proposed for a Reverberant Spatial Audio Object (RSAO), which synthesizes reverberation inside an audio object renderer. An implementation example of an object scheme utilizing the RSAO framework is provided, and supported with listening test results, showing that: the approach correctly retains the sense of room size compared to a convolved reference; editing RSAO parameters can alter the perceived room size and source distance; and, format-agnostic rendering can be exploited to alter listener envelopment
PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS
Multichannel acoustic signal processing has undergone major development
in recent years due to the increased complexity of current audio processing
applications. People want to collaborate through communication with the
feeling of being together and sharing the same environment, what is considered
as Immersive Audio Schemes. In this phenomenon, several acoustic
e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation,
sound source localization, among others. However, high computing
capacity is required to achieve any of these e ects in a real large-scale system,
what represents a considerable limitation for real-time applications.
The increase of the computational capacity has been historically linked
to the number of transistors in a chip. However, nowadays the improvements
in the computational capacity are mainly given by increasing the
number of processing units, i.e expanding parallelism in computing. This
is the case of the Graphics Processing Units (GPUs), that own now thousands
of computing cores. GPUs were traditionally related to graphic or image
applications, but new releases in the GPU programming environments,
CUDA or OpenCL, allowed that most applications were computationally
accelerated in elds beyond graphics. This thesis aims to demonstrate
that GPUs are totally valid tools to carry out audio applications that require
high computational resources. To this end, di erent applications in
the eld of audio processing are studied and performed using GPUs. This
manuscript also analyzes and solves possible limitations in each GPU-based
implementation both from the acoustic point of view as from the computational
point of view. In this document, we have addressed the following
problems:
Most of audio applications are based on massive ltering. Thus, the
rst implementation to undertake is a fundamental operation in the audio
processing: the convolution. It has been rst developed as a computational
kernel and afterwards used for an application that combines multiples convolutions
concurrently: generalized crosstalk cancellation and equalization.
The proposed implementation can successfully manage two di erent and
common situations: size of bu ers that are much larger than the size of the
lters and size of bu ers that are much smaller than the size of the lters.
Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application
deals with binaural audio. Its main feature is that this application is able
to synthesize sound sources in spatial positions that are not included in the
database of HRTF and to generate smoothly movements of sound sources.
Both features were designed after di erent tests (objective and subjective).
The performance regarding number of sound source that could be rendered
in real time was assessed on GPUs with di erent GPU architectures. A
similar performance is measured in a Wave Field Synthesis system (second
spatial audio application) that is composed of 96 loudspeakers. The proposed
GPU-based implementation is able to reduce the room e ects during
the sound source rendering.
A well-known approach for sound source localization in noisy and reverberant
environments is also addressed on a multi-GPU system. This
is the case of the Steered Response Power with Phase Transform (SRPPHAT)
algorithm. Since localization accuracy can be improved by using
high-resolution spatial grids and a high number of microphones, accurate
acoustic localization systems require high computational power. The solutions
implemented in this thesis are evaluated both from localization and
from computational performance points of view, taking into account different
acoustic environments, and always from a real-time implementation
perspective.
Finally, This manuscript addresses also massive multichannel ltering
when the lters present an In nite Impulse Response (IIR). Two cases are
analyzed in this manuscript: 1) IIR lters composed of multiple secondorder
sections, and 2) IIR lters that presents an allpass response. Both
cases are used to develop and accelerate two di erent applications: 1) to
execute multiple Equalizations in a WFS system, and 2) to reduce the
dynamic range in an audio signal.Belloch RodrĂguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale
Spatial audio in small display screen devices
Our work addresses the problem of (visual) clutter in mobile device interfaces. The solution we propose involves the translation of technique-from the graphical to the audio domain-for expliting space in information representation. This article presents an illustrative example in the form of a spatialisedaudio progress bar. In usability tests, participants performed background monitoring tasks significantly more accurately using this spatialised audio (a compared with a conventional visual) progress bar. Moreover, their performance in a simultaneously running, visually demanding foreground task was significantly improved in the eye-free monitoring condition. These results have important implications for the design of multi-tasking interfaces for mobile devices
Gestural Control Of Wavefield synthesis
(Abstract to follow
Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany
The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities.
The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT
- …