764 research outputs found

    2D to 3D ambience upmixing based on perceptual band allocation

    Get PDF
    3D multichannel audio systems employ additional elevated loudspeakers in order to provide listeners with a vertical dimension to their auditory experience. Listening tests were conducted to evaluate the feasibility of a novel vertical upmixing technique called “perceptual band allocation (PBA),” which is based on a psychoacoustic principle of vertical sound localization, the “pitch height” effect. The practical feasibility of the method was investigated using 4-channel ambience signals recorded in a reverberant concert hall using the Hamasaki-Square microphone technique. Results showed that the PBA-upmixed 3D stimuli were significantly stronger than or similar to 9-channel 3D stimuli in 3D listener-envelopment (LEV), depending on the sound source and the crossover frequency of PBA. They also significantly produced greater 3D LEV than the 7-channel 3D stimuli. For the preference tests, the PBA stimuli were significantly preferred over the original 9-channel stimuli

    Reviews on Technology and Standard of Spatial Audio Coding

    Get PDF
    Market  demands  on a more impressive entertainment media have motivated for delivery of three dimensional  (3D) audio content to  home consumers  through Ultra  High  Definition  TV  (UHDTV), the next generation of TV broadcasting, where spatial  audio  coding plays  fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system  will also be elaborated, compared  to  the  traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render  their  own preferred  audio composition.Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audi

    Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array

    Get PDF
    A series of subjective experiments were conducted to investigate a novel vertical image rendering method named “Perceptual Band Allocation (PBA),” using octave bands of pink noise with a vertical 2D reproduction setup with main and height loudspeaker pairs. The perceived height of each octave band was first measured for the main and height loudspeakers individually. Results suggested a significant difference between monophonic and stereophonic images in the perceived relationship between frequency and height. Six different test conditions have been created aiming for various degrees of vertical image spread, in such a way that each frequency band was mapped to either the main or height loudspeaker layer based on the results from the localization experiment. Multiple comparison tests were conducted to grade the perceived magnitude of vertical image spread. It was generally found that various degrees of vertical image spread could be rendered using different PBA schemes, but the perceived results did not fully match predicted results based on the localization results. Differences between the main and height loudspeaker layers in the spectral weightings of ear-input signal at certain frequencies was identified as one of the factors that influenced this result

    Perceptual Optimization of Room-In-Room Reproduction with Spatially Distributed Loudspeakers

    Get PDF
    It is often desirable to reproduce a specific room-acoustic scene, e.g. a concert hall in a playback room, in such a way that the listener has a plausible and authentic spatial impression of the original sound source including the room acoustical properties. In this study a perceptually motivated approach for spatial audio reproduction is developed. This approach optimizes the spatial and monaural cues of the direct and reverberant sound separately. More specifically, the (monaural) spectral cues responsible for the timbre and the (binaural) interaural cross correlation (IACC) cues, responsible for the listener envelopment, were optimized in the playback room to restore the auditory impression of the recording room. The direct sound recorded close to the source is processed with an auditory motivated gammatone filterbank such that the spectral cues, ITD’s and ILD’s are comparable to the direct sound in the recording room. Additionally, the reverberant sound, which was recorded at two distant locations from the source, is played back via dipole loudspeakers. Due to the arrangement of the two dipole loudspeakers, only the diffuse sound field in the playback room is excited, therefore the spectral cues and the IACC of the reverberant sound field can be controlled independently to match the cues that were present in the recording room. As indicated by a preliminary listening test the applied optimization is perceptually similar to the reference signal and is generally preferred when compared to a conventional room-in-room reproduction.DFG, FOR 1732, Individualisierte Hörakustik: Modelle, Algorithmen und Systeme für die Sicherstellung der akustischen Wahrnehmung für alle in allen Situatione

    Optimization and improvements in spatial sound reproduction systems through perceptual considerations

    Full text link
    [ES] La reproducción de las propiedades espaciales del sonido es una cuestión cada vez más importante en muchas aplicaciones inmersivas emergentes. Ya sea en la reproducción de contenido audiovisual en entornos domésticos o en cines, en sistemas de videoconferencia inmersiva o en sistemas de realidad virtual o aumentada, el sonido espacial es crucial para una sensación de inmersión realista. La audición, más allá de la física del sonido, es un fenómeno perceptual influenciado por procesos cognitivos. El objetivo de esta tesis es contribuir con nuevos métodos y conocimiento a la optimización y simplificación de los sistemas de sonido espacial, desde un enfoque perceptual de la experiencia auditiva. Este trabajo trata en una primera parte algunos aspectos particulares relacionados con la reproducción espacial binaural del sonido, como son la escucha con auriculares y la personalización de la Función de Transferencia Relacionada con la Cabeza (Head Related Transfer Function - HRTF). Se ha realizado un estudio sobre la influencia de los auriculares en la percepción de la impresión espacial y la calidad, con especial atención a los efectos de la ecualización y la consiguiente distorsión no lineal. Con respecto a la individualización de la HRTF se presenta una implementación completa de un sistema de medida de HRTF y se introduce un nuevo método para la medida de HRTF en salas no anecoicas. Además, se han realizado dos experimentos diferentes y complementarios que han dado como resultado dos herramientas que pueden ser utilizadas en procesos de individualización de la HRTF, un modelo paramétrico del módulo de la HRTF y un ajuste por escalado de la Diferencia de Tiempo Interaural (Interaural Time Difference - ITD). En una segunda parte sobre reproducción con altavoces, se han evaluado distintas técnicas como la Síntesis de Campo de Ondas (Wave-Field Synthesis - WFS) o la panoramización por amplitud. Con experimentos perceptuales se han estudiado la capacidad de estos sistemas para producir sensación de distancia y la agudeza espacial con la que podemos percibir las fuentes sonoras si se dividen espectralmente y se reproducen en diferentes posiciones. Las aportaciones de esta investigación pretenden hacer más accesibles estas tecnologías al público en general, dada la demanda de experiencias y dispositivos audiovisuales que proporcionen mayor inmersión.[CA] La reproducció de les propietats espacials del so és una qüestió cada vegada més important en moltes aplicacions immersives emergents. Ja siga en la reproducció de contingut audiovisual en entorns domèstics o en cines, en sistemes de videoconferència immersius o en sistemes de realitat virtual o augmentada, el so espacial és crucial per a una sensació d'immersió realista. L'audició, més enllà de la física del so, és un fenomen perceptual influenciat per processos cognitius. L'objectiu d'aquesta tesi és contribuir a l'optimització i simplificació dels sistemes de so espacial amb nous mètodes i coneixement, des d'un criteri perceptual de l'experiència auditiva. Aquest treball tracta, en una primera part, alguns aspectes particulars relacionats amb la reproducció espacial binaural del so, com són l'audició amb auriculars i la personalització de la Funció de Transferència Relacionada amb el Cap (Head Related Transfer Function - HRTF). S'ha realitzat un estudi relacionat amb la influència dels auriculars en la percepció de la impressió espacial i la qualitat, dedicant especial atenció als efectes de l'equalització i la consegüent distorsió no lineal. Respecte a la individualització de la HRTF, es presenta una implementació completa d'un sistema de mesura de HRTF i s'inclou un nou mètode per a la mesura de HRTF en sales no anecoiques. A mès, s'han realitzat dos experiments diferents i complementaris que han donat com a resultat dues eines que poden ser utilitzades en processos d'individualització de la HRTF, un model paramètric del mòdul de la HRTF i un ajustament per escala de la Diferencià del Temps Interaural (Interaural Time Difference - ITD). En una segona part relacionada amb la reproducció amb altaveus, s'han avaluat distintes tècniques com la Síntesi de Camp d'Ones (Wave-Field Synthesis - WFS) o la panoramització per amplitud. Amb experiments perceptuals, s'ha estudiat la capacitat d'aquests sistemes per a produir una sensació de distància i l'agudesa espacial amb que podem percebre les fonts sonores, si es divideixen espectralment i es reprodueixen en diferents posicions. Les aportacions d'aquesta investigació volen fer més accessibles aquestes tecnologies al públic en general, degut a la demanda d'experiències i dispositius audiovisuals que proporcionen major immersió.[EN] The reproduction of the spatial properties of sound is an increasingly important concern in many emerging immersive applications. Whether it is the reproduction of audiovisual content in home environments or in cinemas, immersive video conferencing systems or virtual or augmented reality systems, spatial sound is crucial for a realistic sense of immersion. Hearing, beyond the physics of sound, is a perceptual phenomenon influenced by cognitive processes. The objective of this thesis is to contribute with new methods and knowledge to the optimization and simplification of spatial sound systems, from a perceptual approach to the hearing experience. This dissertation deals in a first part with some particular aspects related to the binaural spatial reproduction of sound, such as listening with headphones and the customization of the Head Related Transfer Function (HRTF). A study has been carried out on the influence of headphones on the perception of spatial impression and quality, with particular attention to the effects of equalization and subsequent non-linear distortion. With regard to the individualization of the HRTF a complete implementation of a HRTF measurement system is presented, and a new method for the measurement of HRTF in non-anechoic conditions is introduced. In addition, two different and complementary experiments have been carried out resulting in two tools that can be used in HRTF individualization processes, a parametric model of the HRTF magnitude and an Interaural Time Difference (ITD) scaling adjustment. In a second part concerning loudspeaker reproduction, different techniques such as Wave-Field Synthesis (WFS) or amplitude panning have been evaluated. With perceptual experiments it has been studied the capacity of these systems to produce a sensation of distance, and the spatial acuity with which we can perceive the sound sources if they are spectrally split and reproduced in different positions. The contributions of this research are intended to make these technologies more accessible to the general public, given the demand for audiovisual experiences and devices with increasing immersion.Gutiérrez Parera, P. (2020). Optimization and improvements in spatial sound reproduction systems through perceptual considerations [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/142696TESI

    Analysis, modeling and wide-area spatiotemporal control of low-frequency sound reproduction

    Get PDF
    This research aims to develop a low-frequency response control methodology capable of delivering a consistent spectral and temporal response over a wide listening area. Low-frequency room acoustics are naturally plagued by room-modes, a result of standing waves at frequencies with wavelengths that are integer multiples of one or more room dimension. The standing wave pattern is different for each modal frequency, causing a complicated sound field exhibiting a highly position-dependent frequency response. Enhanced systems are investigated with multiple degrees of freedom (independently-controllable sound radiating sources) to provide adequate low-frequency response control. The proposed solution, termed a chameleon subwoofer array or CSA, adopts the most advantageous aspects of existing room-mode correction methodologies while emphasizing efficiency and practicality. Multiple degrees of freedom are ideally achieved by employing what is designated a hybrid subwoofer, which provides four orthogonal degrees of freedom configured within a modest-sized enclosure. The CSA software algorithm integrates both objective and subjective measures to address listener preferences including the possibility of individual real-time control. CSAs and existing techniques are evaluated within a novel acoustical modeling system (FDTD simulation toolbox) developed to meet the requirements of this research. Extensive virtual development of CSAs has led to experimentation using a prototype hybrid subwoofer. The resulting performance is in line with the simulations, whereby variance across a wide listening area is reduced by over 50% with only four degrees of freedom. A supplemental novel correction algorithm addresses correction issues at select narrow frequency bands. These frequencies are filtered from the signal and replaced using virtual bass to maintain all aural information, a psychoacoustical effect giving the impression of low-frequency. Virtual bass is synthesized using an original hybrid approach combining two mainstream synthesis procedures while suppressing each method‟s inherent weaknesses. This algorithm is demonstrated to improve CSA output efficiency while maintaining acceptable subjective performance

    Deep-sound field analysis for upscaling ambisonic signals

    Get PDF
    International audienceHigher Order Ambisonics (HOA) is a popular technique used in high quality spatial audio reproduction. Several time and frequency domain methods which exploit sparsity have been proposed in the literature. These methods exploit sparsity and an overcomplete spherical harmonics dictionary is used to compute the DOA of the source. Spherical harmonic decomposition has also been used to render the spatial sound. However, the desired sound field can be reproduced over a small 
reproduction area at lower ambisonic orders. Additionally, this technique is limited by low spatial resolution which can be improved by increasing the number of loudspeakers during spatial sound reproduction. An increase in the number of loudspeakers is not a good choice since it involves solving an underdetermined system of equations for improving spatial resolution. A joint method that upscales the Ambisonics order while simultaneously increasing the number of loudspeakers is a feasible solution to this problem. Deep Neural Networks have hitherto not been investigated in detail in the context of upscaling ambisonics.In this work, a novel Sequential Multi-Stage DNN (SMS-DNN) is developed for upscaling Ambisonic signals. The SMS-DNN consists of sequentially stacked DNNs, where each of the stacked DNN upscales the order of the signal by one. This DNN structure is motivated by the fact that the spherical components of the encoded signal are independent of each other. Additionally for a particular direction <latex>(θ, φ)</latex> of the sound source, increase in the spherical harmonic order only appends higher order spherical harmonic coefficients to the encoder of the previous order, while the lower order spherical harmonic coefficients remain unchanged. Hence the individual DNNs in the SMS-DNN can be trained independently for any upscaling order.Monophonic sound is acquired using a B-format (first order) ambisonic microphone. These signals are upscaled into order-N HOA encoded plane wave sounds using the SMS-DNN in this work. The SMS-DNN allows for training of a very large number of layers since training is performed in blocks consisting of a fixed number of layers. Hence each stage can be trained independently. Additionally, the vanishing gradient problem in DNN with a large number of layers is also effectively handled by the proposed SMS-DNN due to its sequential nature. This method does not require prior estimation of the source locations and works in multiple source scenarios.Experiments on ambisonics upscaling are conducted to evaluate the performance of the proposed method. The SMS-DNN architecture used in the experiment consists of N-1 fully connected feedforward neural networks where each network is trained separately. Here N is the ambisonics order up to which upscaling needs to be performed. An input training dataset where each example is a combination of five randomly located sound sources is also developed for the purpose of training the SMS-DNN. The output training dataset consists of a higher order encoding of the same mixture of sounds with similar locations as input data. Reconstructed sound field analysis, subjective and objective evaluations conducted on the upscaled Ambisonic sound scenes. Mean squared Error analysis of upscaled higher order reproduced fields indicates an error of up to -10dB. As the order of upscaling is increased it is noted that error-free reproduction area (sweet spot) increases. Average error distribution plots are also used to indicate the significance of the proposed method. MUSHRA tests, MOS (subjective evaluation) and PEAQ tests (objective evaluation) are also illustrated to indicate the perceptual quality of the reproduced sounds when compared to benchmark HOA reproduction

    Spatial Acoustic Vector Based Sound Field Reproduction

    Get PDF
    Spatial sound field reproduction aims to recreate an immersive sound field over a spatial region. The existing sound pressure based approaches to spatial sound field reproduction focus on the accurate approximation of original sound pressure over space, which ignores the perceptual accuracy of the reproduced sound field. The acoustic vectors of particle velocity and sound intensity appear to be closely linked with human perception of sound localization in literature. Therefore, in this thesis, we explore the spatial distributions of the acoustic vectors, and seek to develop algorithms to perceptually reproduce the original sound field over a continuous spatial region based on the vectors. A theory of spatial acoustic vectors is first developed, where the spatial distributions of particle velocity and sound intensity are derived from sound pressure. To extract the desired sound pressure from a mixed sound field environment, a 3D sound field separation technique is also formulated. Based on this theory, a series of reproduction techniques are proposed to improve the perceptual performance. The outcomes resulting from this theory are: (i) derivation of a particle velocity assisted 3D sound field reproduction technique which allows for non-uniform loudspeaker geometry with a limited number of loudspeakers, (ii) design of particle velocity based mixed-source sound field translation technique for binaural reproduction that can provide sound field translation with good perceptual experience over a large space, (iii) derivation of an intensity matching technique that can reproduce the desired sound field in a spherical region by controlling the sound intensity on the surface of the region, and (iv) two intensity based multizone sound field reproduction algorithms that can reproduce the desired sound field over multiple spatial zones. Finally, these techniques are evaluated by comparing to the conventional approaches through numerical simulations and real-world experiments

    Real-time Microphone Array Processing for Sound-field Analysis and Perceptually Motivated Reproduction

    Get PDF
    This thesis details real-time implementations of sound-field analysis and perceptually motivated reproduction methods for visualisation and auralisation purposes. For the former, various methods for visualising the relative distribution of sound energy from one point in space are investigated and contrasted; including a novel reformulation of the cross-pattern coherence (CroPaC) algorithm, which integrates a new side-lobe suppression technique. Whereas for auralisation applications, listening tests were conducted to compare ambisonics reproduction with a novel headphone formulation of the directional audio coding (DirAC) method. The results indicate that the side-lobe suppressed CroPaC method offers greater spatial selectivity in reverberant conditions compared with other popular approaches, and that the new DirAC formulation yields higher perceived spatial accuracy when compared to the ambisonics method

    Perception of Reverberation in Domestic and Automotive Environments

    Get PDF
    nrpages: 227status: publishe
    corecore