6 research outputs found

    First-order ambisonic coding with quaternion-based interpolation of PCA rotation matrices

    Get PDF
    International audienceConversational applications such as telephony are mostly restricted to mono. With the emergence of VR/XR applications and new products with spatial audio, there is a need to extend traditional voice and audio codecs to enable immersive communication.The present work is motivated by recent activities in 3GPP standardization around the development of a new codec called immersive voice and audio services (IVAS). The IVAS codec will address a wide variety of use cases, e.g. immersive telephony, spatial audio conferencing, live content sharing. There are two main design goals for IVAS. One goal is the versatility of the codec in terms of input (scene-based, channel-based, object-based audio…) and output (mono, stereo, binaural, various multichannel loudspeaker setups). The second goal is to re-use as much as possible and extend the enhanced voice services (EVS) mono codec.In this work, we focus on the first-order ambisonic (FOA) format which is a good candidate for the internal representation in an immersive audio codec at low bit rates, due to the flexibility of the underlying sound field decomposition. We propose a new coding method, which can extend existing core codecs such as EVS. The proposed method consists in adaptively pre-processing ambisonic components prior to multi-mono coding by a core codec.The first part of this work investigates the basic multi-mono coding approach for FOA, which is for instance used in the Opus codec (in the so-called channel mapping family 2). In this approach ambisonic components are coded separately with different instances of the (mono) core codec. We present results of a subjective test (MUSHRA), which shows that this direct approach is not satisfactory for low-bitrate coding. The signal structure is degraded which produces many spatial artifacts (e.g. wrong panning, ghost sources...). In the second part of this work, we propose a new method to exploit the correlation of ambisonic components. The pre-processing (prior to multi-mono coding) operates in time-domain to allow maximum compatibility with many codecs, especially low bit-rate codecs such as EVS and Opus, and to minimize extra delay.The proposed method applies Principal Components Analysis (PCA) on a 20 ms frame basis. For each frame, eigenvectors are computed and the eigenvector matrix is defined as a 4D rotation matrix. For complex sound scenes (with many audio sources, sudden changes…) rotation parameters may change dramatically between consecutive frames and audio sources may go from one principal component to another, which may cause discontinuities or other artifacts. Solutions such as the interpolation of eigenvectors (after inter-frame realignment) are not optimal. In the proposed method, we ensure smooth transitions between inter-frame PCA rotations thanks to two complementary methods. The first one is a matching algorithm for eigenvectors between the current and the previous frame, which avoids signal inversion and permutation across frames. The second one is an interpolation of the 4D rotation matrices in quaternion domain. We use the Cayley factorization of 4D rotation matrices into a double quaternion for the current and previous frame and apply quaternion spherical linear interpolation (QSLERP) interpolation on a subframe basis. The interpolated rotation matrices are then applied to the ambisonic components and the decorrelated components are coded with a multi-mono coding approach.We present results of a subjective evaluation (MUSHRA) for the proposed method showing that the proposed method brings significant improvements over naive multi-mono method, especially in terms of spatial quality

    AMBIQUAL – a Full Reference Objective Quality Metric for Ambisonic Spatial Audio

    Get PDF
    Streaming spatial audio over networks requires efficient encoding techniques that compress the raw audio content without compromising quality of experience. Streaming service providers such as YouTube need a perceptually relevant objective audio quality metric to monitor users’ perceived quality and spatial localization accuracy. In this paper we introduce a full reference objective spatial audio quality metric, AMBIQUAL, which assesses both Listening Quality and Localization Accuracy. In our solution both metrics are derived directly from the B-format Ambisonic audio. The metric extends and adapts the algorithm used in ViSQOLAudio, a full reference objective metric designed for assessing speech and audio quality. In particular, Listening Quality is derived from the omnidirectional channel and Localization Accuracy is derived from a weighted sum of similarity from B-format directional channels. This paper evaluates whether the proposed AMBIQUAL objective spatial audio quality metric can predict two factors: Listening Quality and Localization Accuracy by comparing its predictions with results from MUSHRA subjective listening tests. In particular, we evaluated the Listening Quality and Localization Accuracy of First and Third-Order Ambisonic audio compressed with the OPUS 1.2 codec at various bitrates (i.e. 32, 128 and 256, 512kbps respectively). The sample set for the tests comprised both recorded and synthetic audio clips with a wide range of time-frequency characteristics. To evaluate Localization Accuracy of compressed audio a number of fixed and dynamic (moving vertically and horizontally) source positions were selected for the test samples. Results showed a strong correlation (PCC=0.919; Spearman=0.882 regarding Listening Quality and PCC=0.854; Spearman=0.842 regarding Localization Accuracy) between objective quality scores derived from the B-format Ambisonic audio using AMBIQUAL and subjective scores obtained during listening MUSHRA tests. AMBIQUAL displays very promising quality assessment predictions for spatial audio. Future work will optimise the algorithm to generalise and validate it for any Higher Order Ambisonic formats

    Auditory Localization in Low-Bitrate Compressed Ambisonic Scenes

    Get PDF
    The increasing popularity of Ambisonics as a spatial audio format for streaming services poses new challenges to existing audio coding techniques. Immersive audio delivered to mobile devices requires an efficient bitrate compression that does not affect the spatial quality of the content. Good localizability of virtual sound sources is one of the key elements that must be preserved. This study was conducted to investigate the localization precision of virtual sound source presentations within Ambisonic scenes encoded with Opus low-bitrate compression at different bitrates and Ambisonic orders (1st, 3rd, and 5th). The test stimuli were reproduced over a 50-channel spherical loudspeaker configuration and binaurally using individually measured and generic Head-Related Transfer Functions (HRTFs). Participants were asked to adjust the position of a virtual acoustic pointer to match the position of virtual sound source within the bitrate-compressed Ambisonic scene. Results show that auditory localization in low-bitrate compressed Ambisonic scenes is not significantly affected by codec parameters. The key factors influencing localization are the rendering method and Ambisonic order truncation. This suggests that efficient perceptual coding might be successfully used for mobile spatial audio delivery

    Context-Based Evaluation of the Opus Audio Codec for Spatial Audio Content in Virtual Reality

    Get PDF
    This paper discusses the evaluation of Opus-compressed Ambisonic audio content through listening tests conducted in a virtual reality environment.The aim of this studywas to investigate the effect that Opus compression has on the Basic Audio Quality (BAQ) of Ambisonic audio in different virtual reality contexts—gaming, music, soundscapes, and teleconferencing. The methods used to produce the test content, how the tests were conducted, the results obtained and their significance are discussed. Key findings were that in all cases, Ambisonic scenes compressed with Opus at 64 kbps/ch using Channel Mapping Family 3 garnered a median BAQ rating not significantly different than uncompressed audio. Channel Mapping Family 3 demonstrated the least variation in BAQ across evaluated contexts, although there were still some significant differences found between contexts at certain bitrates and Ambisonic orders

    Improvements in the Perceived Quality of Streaming and Binaural Rendering of Ambisonics

    Get PDF
    With the increasing popularity of spatial audio content streaming and interactive binaural audio rendering, it is pertinent to study the quality of the critical components of such systems. This includes low-bitrate compression of Ambisonic scenes and binaural rendering schemes. This thesis presents a group of perceptual experiments focusing on these two elements of the Ambisonic delivery chain. The first group of experiments focused on the quality of low-bitrate compression of Ambisonics. The first study evaluated the perceived timbral quality degradation introduced by the Opus audio codec at different bitrate settings and Ambisonic orders. This experiment was conducted using multi-loudspeaker reproduction as well as binaural rendering. The second study has been dedicated to auditory localisation performance in bitrate-compressed Ambisonic scenes reproduced over loudspeakers and binaurally using generic and individually measured HRTF sets. Finally, the third study extended the evaluated set of codec parameters by testing different channel mappings and various audio stimuli contexts. This study was conducted in VR thanks to a purposely developed listening test framework. The comprehensive evaluation of the Opus codec led to a set of recommendations regarding optimal codec parameters. The second group of experiments focused on the evaluation of different methods for binaural rendering of Ambisonics. The first study in this group focused on the implementation of the established methods for designing Ambisonic-to-binaural filters and subsequent objective and subjective evaluations of these. The second study explored the concept of hybrid binaural rendering combining anechoic filters with reverberant ones. Finally, addressing the problem of non-individual HRTFs used for spatial audio rendering, an XR-based method for acquiring individual HRTFs using a single loudspeaker has been proposed. The conducted perceptual evaluations identified key areas where the Ambisonic delivery chain could be improved to provide a more satisfactory user experience

    Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

    Get PDF
    International audienc
    corecore