556 research outputs found
Object-based reverberation for spatial audio
Object-based audio is gaining momentum as a means for future audio content to be more immersive, interactive, and accessible. Recent standardization developments make recommendations for object formats; however, the capture, production, and reproduction of reverberation is an open issue. In this paper parametric approaches for capturing, representing, editing, and rendering reverberation over a 3D spatial audio system are reviewed. A framework is proposed for a Reverberant Spatial Audio Object (RSAO), which synthesizes reverberation inside an audio object renderer. An implementation example of an object scheme utilizing the RSAO framework is provided, and supported with listening test results, showing that: the approach correctly retains the sense of room size compared to a convolved reference; editing RSAO parameters can alter the perceived room size and source distance; and, format-agnostic rendering can be exploited to alter listener envelopment
Spatial sound for computer games and virtual reality
In this chapter, we discuss spatial sound within the context of Virtual Reality and other synthetic environments such as computer games. We review current audio technologies, sound constraints within immersive multi-modal spaces, and future trends. The review process takes into consideration the wide-varying levels of audio sophistication in the gaming and VR industries, ranging from standard stereo output to Head Related Transfer Function implementation. The level of sophistication is determined mostly by hardware/system constraints (such as mobile devices or network limitations), however audio practitioners are developing novel and diverse methods to overcome many of these challenges. No matter what approach is employed, the primary objectives are very similar—the enhancement of the virtual scene and the enrichment of the user experience. We discuss how successful various audio technologies are in achieving these objectives, how they fall short, and how they are aligned to overcome these shortfalls in future implementations
Parametrization, auralization, and authoring of room acoustics for virtual reality applications
The primary goal of this work has been to develop means to represent acoustic properties of an environment with a set of spatial sound related parameters. These parameters are used for creating virtual environments, where the sounds are expected to be perceived by the user as if they were listened to in a corresponding real space. The virtual world may consist of both visual and audio components. Ideally in such an application, the sound and the visual parts of the virtual scene are in coherence with each other, which should improve the user immersion in the virtual environment.
The second aim was to verify the feasibility of the created sound environment parameter set in practice. A virtual acoustic modeling system was implemented, where any spatial sound scene, defined by using the developed parameters, can be rendered audible in real time. In other words the user can listen to the auralized sound according to the defined sound scene parameters.
Thirdly, the authoring of creating such parametric sound scene representations was addressed. In this authoring framework, sound scenes and an associated visual scene can be created to be further encoded and transmitted in real time to a remotely located renderer. The visual scene counterpart was created as a part of the multimedia scene acting simultaneously as a user interface for renderer-side interaction.reviewe
Bimodal Audiovisual Perception in Interactive Application Systems of Moderate Complexity
The dissertation at hand deals with aspects of quality perception of
interactive audiovisual application systems of moderate complexity as e.g.
defined in the MPEG-4 standard. Because in these systems the available
computing power is limited, it is decisive to know which factors influence
the perceived quality. Only then can the available computing power be
distributed in the most effective and efficient way for the simulation and
display of audiovisual 3D scenes. Whereas quality factors for the unimodal
auditory and visual stimuli are well known and respective models of
perception have been successfully devised based on this knowledge, this is
not true for bimodal audiovisual perception. For the latter, it is only
known that some kind of interdependency between auditory and visual
perception does exist. The exact mechanisms of human audiovisual perception
have not been described. It is assumed that interaction with an application
or scene has a major influence upon the perceived overall quality.
The goal of this work was to devise a system capable of performing
subjective audiovisual assessments in the given context in a largely
automated way. By applying the system, first evidence regarding audiovisual
interdependency and influence of interaction upon perception should be
collected. Therefore this work was composed of three fields of activities:
the creation of a test bench based on the available but (regarding the
audio functionality) somewhat restricted MPEG-4 player, the preoccupation
with methods and framework requirements that ensure comparability and
reproducibility of audiovisual assessments and results, and the performance
of a series of coordinated experiments including the analysis and
interpretation of the collected data. An object-based modular audio
rendering engine was co-designed and -implemented which allows to perform
simple room-acoustic simulations based on the MPEG-4 scene description
paradigm in real-time. Apart from the MPEG-4 player, the test bench
consists of a haptic Input Device used by test subjects to enter their
quality ratings and a logging tool that allows to journalize all relevant
events during an assessment session. The collected data can be exported
comfortably for further analysis using appropriate statistic tools.
A thorough analysis of the well established test methods and
recommendations for unimodal subjective assessments was performed to find
out whether a transfer to the audiovisual bimodal case is easily possible.
It became evident that - due to the limited knowledge about the underlying
perceptual processes - a novel categorization of experiments according to
their goals could be helpful to organize the research in the field.
Furthermore, a number of influencing factors could be identified that
exercise control over bimodal perception in the given context.
By performing the perceptual experiments using the devised system, its
functionality and ease of use was verified. Apart from that, some first
indications for the role of interaction in perceived overall quality have
been collected: interaction in the auditory modality reduces a human's
ability of correctly rating the audio quality, whereas visually based
(cross-modal) interaction does not necessarily generate this effect.Die vorliegende Dissertation beschäftigt sich mit Aspekten der
Qualitätswahrnehmung von interaktiven audiovisuellen Anwendungssystemen
moderater Komplexität, wie sie z.B. durch den MPEG-4 Standard definiert
sind. Die Frage, welche Faktoren Einfluss auf die wahrgenommene Qualität
von audiovisuellen Anwendungssystemen haben ist entscheidend dafĂĽr, wie die
nur begrenzt zur VerfĂĽgung stehende Rechenleistung fĂĽr die
Echtzeit-Simulation von 3D Szenen und deren Darbietung sinnvoll verteilt
werden soll. Während Qualitätsfaktoren für unimodale auditive als auch
visuelle Stimuli seit langem bekannt sind und entsprechende Modelle
existieren, mĂĽssen diese fĂĽr die bimodale audiovisuelle Wahrnehmung noch
hergeleitet werden. Dabei ist bekannt, dass eine Wechselwirkung zwischen
auditiver und visueller Qualität besteht, nicht jedoch, wie die Mechanismen
menschlicher audiovisueller Wahrnehmung genau arbeiten. Es wird auch
angenommen, dass der Faktor Interaktion einen wesentlichen Einfluss auf
wahrgenommene Qualität hat.
Das Ziel dieser Arbeit war, ein System fĂĽr die zeitsparende und weitgehend
automatisierte DurchfĂĽhrung von subjektiven audiovisuellen
Wahrnehmungstests im gegebenen Kontext zu erstellen und es fĂĽr einige
exemplarische Experimente einzusetzen, welche erste Aussagen ĂĽber
audiovisuelleWechselwirkungen und den Einfluss von Interaktion auf die
Wahrnehmung erlauben sollten. Demzufolge gliederte sich die Arbeit in drei
Aufgabenbereiche: die Erstellung eines geeigneten Testsystems auf der
Grundlage eines vorhandenen, jedoch in seiner Audiofunktionalität noch
eingeschränkten MPEG-4 Players, das Sicherstellen von Vergleichbarkeit und
Wiederholbarkeit von audiovisuellen Wahrnehmungstests durch definierte
Testmethoden und -bedingungen, und die eigentliche DurchfĂĽhrung der
aufeinander abgestimmten Experimente mit anschlieĂżender Auswertung und
Interpretation der gewonnenen Daten. Dazu wurde eine objektbasierte,
modulare Audio-Engine mitentworfen und -implementiert, welche basierend auf
den Möglichkeiten der MPEG-4 Szenenbeschreibung alle Fähigkeiten zur
Echtzeitberechnung von Raumakustik bietet. Innerhalb des entwickelten
Testsystems kommuniziert der MPEG-4 Player mit einem hardwaregestĂĽtzten
Benutzerinterface zur Eingabe der Qualitätsbewertungen durch die
Testpersonen. Sämtliche relevanten Ereignisse, die während einer
Testsession auftreten, können mit Hilfe eines Logging-Tools aufgezeichnet
und fĂĽr die weitere Datenanalyse mit Statistikprogrammen exportiert werden.
Eine Analyse der existierenden Testmethoden und -empfehlungen fĂĽr unimodale
Wahrnehmungstests sollte zeigen, ob deren Ăśbertragung auf den
audiovisuellen Fall möglich ist. Dabei wurde deutlich, dass bedingt durch
die fehlende Kenntnis der zugrundeliegenden Wahrnehmungsprozesse zunächst
eine Unterteilung nach den Zielen der durchgefĂĽhrten Experimente sinnvoll
erscheint. Weiterhin konnten Einflussfaktoren identifiziert werden, die die
bimodale Wahrnehmung im gegebenen Kontext steuern.
Bei der DurchfĂĽhrung der Wahrnehmungsexperimente wurde die
Funktionsfähigkeit des erstellten Testsystems verifiziert. Darüber hinaus
ergaben sich erste Anhaltspunkte fĂĽr den Einfluss von Interaktion auf die
wahrgenommene Gesamtqualität: Interaktion in der auditiven Modalität
verringert die Fähigkeit, Audioqualität korrekt beurteilen zu können,
während visuell gestützte Interaktion (cross-modal) diesen Effekt nicht
zwingend generiert
A History of Audio Effects
Audio effects are an essential tool that the field of music production relies upon. The ability to intentionally manipulate and modify a piece of sound has opened up considerable opportunities for music making. The evolution of technology has often driven new audio tools and effects, from early architectural acoustics through electromechanical and electronic devices to the digitisation of music production studios. Throughout time, music has constantly borrowed ideas and technological advancements from all other fields and contributed back to the innovative technology. This is defined as transsectorial innovation and fundamentally underpins the technological developments of audio effects. The development and evolution of audio effect technology is discussed, highlighting major technical breakthroughs and the impact of available audio effects
A PatchMatch-based Dense-field Algorithm for Video Copy-Move Detection and Localization
We propose a new algorithm for the reliable detection and localization of
video copy-move forgeries. Discovering well crafted video copy-moves may be
very difficult, especially when some uniform background is copied to occlude
foreground objects. To reliably detect both additive and occlusive copy-moves
we use a dense-field approach, with invariant features that guarantee
robustness to several post-processing operations. To limit complexity, a
suitable video-oriented version of PatchMatch is used, with a multiresolution
search strategy, and a focus on volumes of interest. Performance assessment
relies on a new dataset, designed ad hoc, with realistic copy-moves and a wide
variety of challenging situations. Experimental results show the proposed
method to detect and localize video copy-moves with good accuracy even in
adverse conditions
Audio for Virtual, Augmented and Mixed Realities: Proceedings of ICSA 2019 ; 5th International Conference on Spatial Audio ; September 26th to 28th, 2019, Ilmenau, Germany
The ICSA 2019 focuses on a multidisciplinary bringing together of developers, scientists, users, and content creators of and for spatial audio systems and services. A special focus is on audio for so-called virtual, augmented, and mixed realities.
The fields of ICSA 2019 are: - Development and scientific investigation of technical systems and services for spatial audio recording, processing and reproduction / - Creation of content for reproduction via spatial audio systems and services / - Use and application of spatial audio systems and content presentation services / - Media impact of content and spatial audio systems and services from the point of view of media science. The ICSA 2019 is organized by VDT and TU Ilmenau with support of Fraunhofer Institute for Digital Media Technology IDMT
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
- …