57 research outputs found

    Subjective evaluation of auditory spatial imagery associated with decorrelated subwoofer signals

    Get PDF
    Presented at the 8th International Conference on Auditory Display (ICAD), Kyoto, Japan, July 2-5, 2002.Although only a single subwoofer is typically used in two-channel and multichannel stereophonic sound reproduction, the use of two subwoofers enables manipulation of low-frequency interaural crosscorrelation (IACC), and this manipulation is particularly effective in producing variation in auditory spatial imagery. In order to document this variation objectively, a series of listening experiments were executed using a set of stimuli generated at five correlation values and presented in two reproduction modes. Both modes used two subwoofers, but in one of the reproduction modes identical signals were applied to the two subwoofers. The results of both exploratory and confirmatory listening experiments showed that the range of variation in both perceived auditory source width (ASW) and perceived auditory source distance (ASD) is reduced when negatively correlated signals are not reproduced at low frequencies. Global dissimilarity judgments were made for this set of ten stimuli in an exploratory study designed to reveal the salient perceptual dimensions of the stimuli. A subsequent confirmatory study employed a two-alternative forced-choice task in order to determine how identifiably different the stimuli were with respect to the two perceptual attributes revealed in the exploratory study, those two attributes being ASW and ASD. The implications of these findings for loudspeaker-based spatial auditory display are discussed

    Backward Compatible Spatialized Teleconferencing based on Squeezed Recordings

    Get PDF
    Commercial teleconferencing systems currently available, although offering sophisticated video stimulus of the remote participants, commonly employ only mono or stereo audio playback for the user. However, in teleconferencing applications where there are multiple participants at multiple sites, spatializing the audio reproduced at each site (using headphones or loudspeakers) to assist listeners to distinguish between participating speakers can significantly improve the meeting experience (Baldis, 2001; Evans et al., 2000; Ward & Elko 1999; Kilgore et al., 2003; Wrigley et al., 2009; James & Hawksford, 2008). An example is Vocal Village (Kilgore et al., 2003), which uses online avatars to co-locate remote participants over the Internet in virtual space with audio spatialized over headphones (Kilgore, et al., 2003). This system adds speaker location cues to monaural speech to create a user manipulable soundfield that matches the avatar’s position in the virtual space. Giving participants the freedom to manipulate the acoustic location of other participants in the rendered sound scene that they experience has been shown to provide for improved multitasking performance (Wrigley et al., 2009). A system for multiparty teleconferencing requires firstly a stage for recording speech from multiple participants at each site. These signals then need to be compressed to allow for efficient transmission of the spatial speech. One approach is to utilise close-talking microphones to record each participant (e.g. lapel microphones), and then encode each speech signal separately prior to transmission (James & Hawksford, 2008). Alternatively, for increased flexibility, a microphone array located at a central point on, say, a meeting table can be used to generate a multichannel recording of the meeting speech A microphone array approach is adopted in this work and allows for processing of the recordings to identify relative spatial locations of the sources as well as multichannel speech enhancement techniques to improve the quality of recordings in noisy environments. For efficient transmission of the recorded signals, the approach also requires a multichannel compression technique suitable to spatially recorded speech signals

    Future spatial audio : Subjective evaluation of 3D surround systems

    Get PDF
    Current surround systems are being developed to include height channels to provide the listener with a 3D listening experience. It is not well understood the impact the height channels will have on the listening experience and aspects associated with multichannel reproduction like localisation and envelopment or if there are any new subjective attributes concerned with 3D surround systems. Therefore in this research subjective factors like localisation and envelopment were investigated and then descriptive analysis was used. In terms of localisation it was found that for sources panned in the median plane localisation accuracy was not improved with higher order ambisonics. However for sources in the frontal plane higher order ambisonics improves localisation accuracy for elevated sound sources. It was also found that for a simulation of a number of 2D and 3D surround systems, using a decorrelated noise signal to simulate a diffuse soundfield, there was no improvement in envelopment with the addition of height. On the other hand height was found to improve the perception of envelopment with the use of 3D recorded sound scenes, although for an applause sample which had similar properties to that of the decorrelated noise sample there was no significant difference between 2D and 3D systems. Five attribute scales emerged from the descriptive analysis of which it was found that there were significant differences between 2D and 3D systems using the attribute scale size for both ambisonics and VBAP rendered systems. Also 3D higher order ambisonics significantly enhances the perception of presence. A final principal component analysis found that there were 2 factors which characterised the ambisonic rendered systems and 3 factors which characterised the VBAP rendered sound scenes. This suggests that the derived scales need to be used with a wider number of sound scenes in order to fully validate them

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    Soundfield representation, reconstruction and perception

    No full text
    This thesis covers the area of soundfield representation, reconstruction and perception. The complexity and information content of a soundfield presents many mathematical and engineering challenges for accurate reconstruction. After an in-depth review of the field of mathematical soundfield representation, an analysis of the numerical and practical constraints for soundfield reconstruction is presented. A review of work in experimental psycho-acoustics higlights the variability of spatial sound perception. It is shown that the error and uncertainty in perception is of a comparable magnitude to the accuracy achievable by present soundfield systems. Therefore, the effects of hearing adaption, sensory bias, sensory conflict, and contextual memory cannot be ignored. If the listening environment is inappropriate or in conflict with the desired perceptual experience, little is gained from more complex soundfield representation or reconstruction. The imp! lications of this result to the delivery of spatial audio is discussed and some open problems for further exploration and experimentation are detailed

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    Spatial Acoustic Vector Based Sound Field Reproduction

    Get PDF
    Spatial sound field reproduction aims to recreate an immersive sound field over a spatial region. The existing sound pressure based approaches to spatial sound field reproduction focus on the accurate approximation of original sound pressure over space, which ignores the perceptual accuracy of the reproduced sound field. The acoustic vectors of particle velocity and sound intensity appear to be closely linked with human perception of sound localization in literature. Therefore, in this thesis, we explore the spatial distributions of the acoustic vectors, and seek to develop algorithms to perceptually reproduce the original sound field over a continuous spatial region based on the vectors. A theory of spatial acoustic vectors is first developed, where the spatial distributions of particle velocity and sound intensity are derived from sound pressure. To extract the desired sound pressure from a mixed sound field environment, a 3D sound field separation technique is also formulated. Based on this theory, a series of reproduction techniques are proposed to improve the perceptual performance. The outcomes resulting from this theory are: (i) derivation of a particle velocity assisted 3D sound field reproduction technique which allows for non-uniform loudspeaker geometry with a limited number of loudspeakers, (ii) design of particle velocity based mixed-source sound field translation technique for binaural reproduction that can provide sound field translation with good perceptual experience over a large space, (iii) derivation of an intensity matching technique that can reproduce the desired sound field in a spherical region by controlling the sound intensity on the surface of the region, and (iv) two intensity based multizone sound field reproduction algorithms that can reproduce the desired sound field over multiple spatial zones. Finally, these techniques are evaluated by comparing to the conventional approaches through numerical simulations and real-world experiments
    • …
    corecore