104 research outputs found
Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes
The increasing popularity of Ambisonics as a spatial audio format for streaming services poses new challenges to existing audio coding techniques. Immersive audio delivered to mobile devices requires an efficient bitrate compression that does not affect the spatial quality of the content. Good localizability of virtual sound sources is one of the key elements that must be preserved. This study was conducted to investigate the localization precision of virtual sound source presentations within Ambisonic scenes encoded with Opus low-bitrate compression at different bitrates and Ambisonic orders (1st, 3rd, and 5th). The test stimuli were reproduced over a 50-channel spherical loudspeaker configuration and binaurally using individually measured and generic Head-Related Transfer Functions (HRTFs). Participants were asked to adjust the position of a virtual acoustic pointer to match the position of virtual sound source within the bitrate-compressed Ambisonic scene. Results show that auditory localization in low-bitrate compressed Ambisonic scenes is not significantly affected by codec parameters. The key factors influencing localization are the rendering method and Ambisonic order truncation. This suggests that efficient perceptual coding might be successfully used for mobile spatial audio delivery
Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research
Research into multi-modal perception, human cognition, behavior, and
attention can benefit from high-fidelity content that may recreate
real-life-like scenes when rendered on head-mounted displays. Moreover, aspects
of audiovisual perception, cognitive processes, and behavior may complement
questionnaire-based Quality of Experience (QoE) evaluation of interactive
virtual environments. Currently, there is a lack of high-quality open-source
audiovisual databases that can be used to evaluate such aspects or systems
capable of reproducing high-quality content. With this paper, we provide a
publicly available audiovisual database consisting of twelve scenes capturing
real-life nature and urban environments with a video resolution of 7680x3840 at
60 frames-per-second and with 4th-order Ambisonics audio. These 360 video
sequences, with an average duration of 60 seconds, represent real-life settings
for systematically evaluating various dimensions of uni-/multi-modal
perception, cognition, behavior, and QoE. The paper provides details of the
scene requirements, recording approach, and scene descriptions. The database
provides high-quality reference material with a balanced focus on auditory and
visual sensory information. The database will be continuously updated with
additional scenes and further metadata such as human ratings and saliency
information.Comment: 6 pages, 2 figures, accepted and presented at the 2022 14th
International Conference on Quality of Multimedia Experience (QoMEX).
Database is publicly accessible at https://qoevave.github.io/database
Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics
With the increasing popularity of spatial audio content streaming and interactive binaural audio rendering, it is pertinent to study the quality of the critical components of such systems. This includes low-bitrate compression of Ambisonic scenes and binaural rendering schemes. This thesis presents a group of perceptual experiments focusing on these two elements of the Ambisonic delivery chain.
The first group of experiments focused on the quality of low-bitrate compression of Ambisonics. The first study evaluated the perceived timbral quality degradation introduced by the Opus audio codec at different bitrate settings and Ambisonic orders. This experiment was conducted using multi-loudspeaker reproduction as well as binaural rendering. The second study has been dedicated to auditory localisation performance in bitrate-compressed Ambisonic scenes reproduced over loudspeakers and binaurally using generic and individually measured HRTF sets. Finally, the third study extended the evaluated set of codec parameters by testing different channel mappings and various audio stimuli contexts. This study was conducted in VR thanks to a purposely developed listening test framework. The comprehensive evaluation of the Opus codec led to a set of recommendations regarding optimal codec parameters.
The second group of experiments focused on the evaluation of different methods for binaural rendering of Ambisonics. The first study in this group focused on the implementation of the established methods for designing Ambisonic-to-binaural filters and subsequent objective and subjective evaluations of these. The second study explored the concept of hybrid binaural rendering combining anechoic filters with reverberant ones. Finally, addressing the problem of non-individual HRTFs used for spatial audio rendering, an XR-based method for acquiring individual HRTFs using a single loudspeaker has been proposed.
The conducted perceptual evaluations identified key areas where the Ambisonic delivery chain could be improved to provide a more satisfactory user experience
Recommended from our members
Signal Coding Approaches for Spatial Audio and Unreliable Networks
This dissertation is divided into two parts. The first part is concerned with developing algorithms for the compression of emerging 3D audio format, while the second part investigates optimization techniques for error-resilient predictive compression systems design.In the first part, advances in development of compression algorithms for higher order ambisonics (HOA) data is presented. HOA has proven to be the method of choice in virtual reality applications, given its capability in reproducing spatial audio and its rendering flexibility. Recent standardization for HOA compression adopted a framework wherein HOA data are decomposed into principal components that are then encoded by standard audio coding, i.e., frequency domain quantization and entropy coding to exploit psychoacoustic redundancy. A noted shortcoming of this approach is the occasional mismatch in principal components across blocks, and the resulting suboptimal transitions in the data fed to the audio coder. In this dissertation, we propose a framework where singular value decomposition (SVD) is performed after transformation to the frequency domain via the modified discrete cosine transform (MDCT). This framework not only ensures smooth transition across blocks, but also enables frequency dependent SVD for better energy compaction. Moreover, we introduce a novel noise substitution technique to compensate for suppressed ambient energy in discarded higher order ambisonics channels, which significantly enhances the perceptual quality of the reconstructed HOA signal. In the next step, to reduce the burden of side information, a new encoding architecture is presented, where transform matrices are estimated backward-adaptively. This framework allows a more frequent usage of optimal SVD, thereby approaching the full potential of frequencydomain SVD. Also the division of HOA data into predominant and ambient components in current schemes, is difficult to perceptually optimize and ignores spatial inter channel masking effects. To address this issues, a new encoding framework for compression of HOA data is presented, where a null-space basis vector extension technique enables all compression to be performed in the SVD domain, and a jointly computed common masking threshold accounts for effects of spatial masking across components.The second part is concerned with developing optimization techniques for error-resilient predictive compression systems design. Prediction is used in virtually all compression systems and when such a compressed signal is transmitted over unreliable networks, packet losses can lead to significant error propagation through the prediction loop. Despite this, the conventional design technique completely ignores the effect of packet losses, and estimates the prediction parameters to minimize the mean squared prediction error, and optimizes the quantizer to minimize the reconstruction error at the encoder. While some design techniques have been proposed toaccurately estimate and minimize the end-to-end distortion (EED) at the decoderthat accounts for packet losses, they operate in a closed-loop, which introduces a mismatch between statistics used for design and statistics used in operation, causing a negative impact on convergenceand stability of the design procedure. The first contribution of the dissertation is this part is proposing an effective technique for designing a compression system with a first order linear predictor, that accounts for the instability caused by error propagation due to packet losses, and enjoys stable statistics during design by employing open-loop iterations that on convergence mimic closed loop operation.End-to-end distortion (EED) estimation, accounting for error propagationand concealment at the decoder, has been originally developed for video coding, and enables optimal rate-distortion (RD) decisions at the encoder. However, this approach was limited to the video coder’ssimple setting of a single tap constant coefficient temporal predictor. This thesis considerably generalized the framework to account for: i) high order prediction filters, and ii) filter adaptation to localsignal statistics. We demonstrate how this EED estimatecan be leveraged, by an encoder with short and long term linearprediction, to improve RD decisions and achieve major performance gains. The approach is further extended to estimate EED in speech coders. The error propagation problem is exacerbated in this case, as standard coders not only predict the signal from past frames, but also the parameters (in the line spectral frequency domain) employed for such prediction. Hence, the prediction loop propagates errors in the reconstructed signal as well as errors in the prediction parameters. A recursive algorithm is proposed to estimate, at the encoder, the overall EED, by the subterfuge of parallel tracking of decoder statistics for prediction parameters and signal reconstructions, in their respective domains, which are then combined to obtain the ultimate EED estimate
Virtual reality supported simulation training for healthcare personnel
Sørlandet sykehus helseforetak suggested a thesis to design a virtual reality (VR) application with
360-degree videos to be used as an educational tool for training psychiatric healthcare personnel in
handling aggressive and suicidal patients. This study aims to find the advantages of using interactive
VR scenarios as a supplement to simulation training by developing an application using humancentred design. The article explores how an educational VR application can be designed using
human-centred design, the advantages and challenges of using it as a tool for learning, and nurses’
satisfaction with the application.
Simulation training is used in healthcare education and training because it allows participants to
actively train in challenging situations while remaining in a safe environment. However, because of
the time investment and organisational constraints related to simulation training, the use of VR as
a possible supplement is currently being researched and explored by Norwegian hospitals.
The present research study utilised a mixed-method approach comprised of the usability testing
method, combining observations and interviews, and a survey to answer the research questions. The
observations and interviews mainly focused on exploring the design, advantages, and challenges of
the application in healthcare training. The survey focused on exploring satisfaction levels. This
resulted in two interactive VR scenarios that train psychiatric healthcare personnel in handling
a suicidal patients and threats from an aggressive patients.The application was developed using
Premiere Pro, Audition, and Unity, which resulted in an active learning environment where users
directly affect the outcomes of scenarios.
The advantages of the application are related to its affordability, its use as a learning tool for
experiential and affective learning, and its use as a mediator for peer learning. The disadvantages
of the application are related to the immersion of experienced healthcare personnel in the topics,
the time investment for developing such an application, and the fact that it may not adequately
sufficiently replace physical scenarios. Lastly, the application was satisfying based on the healthcare
workers’ responses.
Future research could explore the research’s advantages and challenges in depth to re-evaluate their
importance in healthcare training or in other fields. For example, future research could directly
test interactive VR simulations in simulation training or test whether it promotes peer learning.
Exploring interactive VR scenarios in different cultures and professions could also result in different
satisfaction levels
Videos in Context for Telecommunication and Spatial Browsing
The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance
Sonic Interactions in Virtual Environments
This open access book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019
International audienc
Sonic interactions in virtual environments
This book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
- …