589 research outputs found

    Reviews on Technology and Standard of Spatial Audio Coding

    Get PDF
    Market  demands  on a more impressive entertainment media have motivated for delivery of three dimensional  (3D) audio content to  home consumers  through Ultra  High  Definition  TV  (UHDTV), the next generation of TV broadcasting, where spatial  audio  coding plays  fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system  will also be elaborated, compared  to  the  traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render  their  own preferred  audio composition.Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audi

    Psychoacoustic Considerations in Surround Sound with Height

    Get PDF
    This paper presents recent research findings in the psychoacoustics of 3D multichannel sound recording and rendering. The addition of height channels in new reproduction formats such as Auro-3D, Dolby Atmos and 22.2, etc. enhances the perceived spatial impression in reproduction. To achieve optimal acoustic recording and signal processing for such formats, it is first important to understand the fundamental principles of how we perceive sounds reproduced from vertically oriented stereophonic loudspeakers. Recent studies by the authors in this field provide insights into how such principles can be applied for practical 3D recording and upmixing. Topics that are discussed in this paper include the interchannel level and time difference relationships in terms of vertically induced interchannel crosstalk, the effectiveness of the precedence effect in the vertical plane, the aspect of tonal coloration resulting from vertical stereophonic reproduction, the effect of vertical microphone spacing on envelopment, the effect of interchannel decorrelation, and the use of spectral cues for extending vertical image spread

    Software Defined Media: Virtualization of Audio-Visual Services

    Full text link
    Internet-native audio-visual services are witnessing rapid development. Among these services, object-based audio-visual services are gaining importance. In 2014, we established the Software Defined Media (SDM) consortium to target new research areas and markets involving object-based digital media and Internet-by-design audio-visual environments. In this paper, we introduce the SDM architecture that virtualizes networked audio-visual services along with the development of smart buildings and smart cities using Internet of Things (IoT) devices and smart building facilities. Moreover, we design the SDM architecture as a layered architecture to promote the development of innovative applications on the basis of rapid advancements in software-defined networking (SDN). Then, we implement a prototype system based on the architecture, present the system at an exhibition, and provide it as an SDM API to application developers at hackathons. Various types of applications are developed using the API at these events. An evaluation of SDM API access shows that the prototype SDM platform effectively provides 3D audio reproducibility and interactiveness for SDM applications.Comment: IEEE International Conference on Communications (ICC2017), Paris, France, 21-25 May 201

    Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres

    Full text link
    Immersive broadcasting applications have received a lot of attention in the last years. In this context, the development of advanced HDTV and 3DTV formats is being successfully adopted by the consumer market, having a strong impact in the way that traditional broadcasting contents are displayed to final users. Together with the above advances in video technology, multichannel spatial audio has also experienced a considerable impulse within the audiovisual industry. However, the need for specific production tools and loudspeaker setups corresponding to multiple competing audio formats seems to be an important factor affecting their adoption by the consumer community. Moreover, it is well-known that the perceived audio quality is highly influenced by the reproduction context, where the existing multimodal interaction between audio and video plays a very important role. This paper presents a formal evaluation of the perceived sound quality provided by several spatial audio formats accompanied with video in the context of television broadcasting. Stereo, advanced surround formats and 3D Binaural sound are evaluated considering a set of representative broadcasting contents (sports, movies, music and animation) to assess their impact on the perceptual attributes contemplated within the international recommendations.The Spanish Ministry of Economy and Competitiveness and FEDER supported this work under the projects TEC2012-37945- 1091 C02- 01/02.Cobos Serrano, M.; López Monfort, JJ.; Navarro Ruiz, JM.; Ramos Peinado, G. (2015). Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres. Multimedia Systems. 21(4):363-379. doi:10.1007/s00530-013-0340-2S363379214Apostolopoulos, J., Chou, P., Culbertson, B., Kalker, T., Trott, M., Wee, S.: The road to immersive communication. Proc. IEEE 100(4), 974–990 (2012). doi: 10.1109/JPROC.2011.2182069Huang, Y., Chen, J., Benesty, J.: Immersive audio schemes. IEEE Signal Process. Mag. 28(1), 20–32 (2011). doi: 10.1109/MSP.2010.938754Huynh-Thu, Q., Barkowsky, M., Le Callet, P.: The importance of visual attention in improving the 3D-TV viewing experience: overview and new perspectives. IEEE Trans. Broadcast. 57(2), 421–431 (2011). doi: 10.1109/TBC.2011.2128250Wang, K., Barkowsky, M., Brunnstrom, K., Sjostrom, M., Cousseau, R., Le Callet, P.: Perceived 3D TV transmission quality assessment: multi-laboratory results using absolute category rating on quality of experience scale. IEEE Trans. Broadcast. PP(99), 1 (2012). doi: 10.1109/TBC.2012.2191031Zhang, L., Vazquez, C., Knorr, S.: 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011). doi: 10.1109/TBC.2011.2122930Kyriakakis, C., Tsakalides, P., Holman, T.: Surrounded by sound. IEEE Signal Process. Mag. 16(1), 55–66 (1999). doi: 10.1109/79.743868Rumsey, F.: Spatial Audio. Focal Press, Waltham (2001)Eargle, J.M. (eds): AES Anthology: Stereophonic Techniques. Publications of the Audio Engineering Society, New York (1986)Holman, T.: 5.1 Surround Sound: Up and Running, 2nd edn. Focal Press, Waltham (2007)Steinke, G.: High definition surround sound with accompanying HD picture. In: Proceedings of the International Tonmeister Symposium. Vabaria (2005)Breebaart, J., Faller, C.: Spatial Audio Processing: MPEG Surround and Other Applications. Wiley, Chichester (2007)Holman, T.: Sound for Film and Television, 3rd edn. Focal Press, Waltham (2010)Theile, G.: HDTV sound systems: how many channels? In: Proceedings of the AES 9th International Conference. Detroit, Michigan (1991)Strohmeier, D., Jumisko-Pyykkö S.: How does my 3D video sound like?—impact of loudspeaker set-ups on audiovisual quality on mid-sized autostereoscopic display. In: Proceedings of the 3DTV Conference (3DTV-CON’08). Istanbul, Turkey (2008)Zielinski, S., Rumsey, F., Bech, S.: Subjective audio quality trade-offs in consumer multichannel audio-visual delivery systems. Part I: Effects of high frequency limitation. In: Proceedings of the AES 112th Convention. Munich, Germany (2002)Bech, S., Zacharov, N.: Perceptual audio evaluation—theory, method and application. John Wiley & Sons, Chichester (2006)Brotherton, M.D., Huynh-Thu, Q., Hands, D.S., Brunnstrom, K.: Subjective multimedia quality assessment. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89-A(11), 2920–2932 (2006)Jumisko-Pyykkö, S., Hakkinen, J., Nyman, G.: Experienced quality factors—qualitative evaluation approach to audiovisual quality. In: Proceedings of 19th SPIE Annual Symposium on Electronic Imaging. San Jose, California, USA (2007)Recommendation ITU-R BS.775-1: Multichannel Stereophonic Sound System With and Without Accompanying Picture. International Telecommunications Union, Geneva (1994)Recommendation ITU-T P.911: Subjective Audiovisual Quality Assessment Methods for Multimedia Applications. International Telecommunications Union, Geneva (1998)EBU Tech 3276-E: Supplement 1—Listening Conditions for the Assessment of Sound Programme Material: Multichannel Sound. International Telecommunications Union, Geneva (2004)Theile, G.: On the naturalness of two-channel stereo sound. J. Audio Eng. Soc. 39, 761–767 (1991)Dolby 7.1 home theater speaker guide. http://www.dolby.com/ . Accessed 05 July 2012Silzle, A., George, S., Habets, E.A.P., Bachmann, T.: Investigation on the quality of 3D sound reproduction. In: Proceedings of the International Conference on Spatial Audio (ICSA 2011). Detmold, Germany (2011)Hamasaki, K., Hiyama, K., Okumura, R.: The 22.2 multichannel sound system and its application. In: Proceedings of the 118th AES Convention. Barcelona, Spain (2005)Theile, G., Wittek, H.: Principles in surround recordings with height. In: Proceedings of the 130th AES Convention. London, UK (2011)Dolby ProLogic IIz. http://www.dolby.com/ . Accessed 05 July 2012Kim, S., Lee, Y.W., Pulkki, V.: New 10.2-channel vertical surround system (10.2-VSS); comparison study of perceived audio quality in various multichannel sound systems with height loudspeakers. In: Proceedings of the 129th AES Convention. San Francisco, USA (2010)Algazi, V.R., Duda, R.Q.: Headphone-based spatial sound. IEEE Signal Process. Mag. 28(1), 33–42 (2011)Moller, H., Sorensen, M.F., Jensen, C.B., Hammershoi, D.: Binaural technique: do we need individual recordings? J. Audio Eng. Soc. 44, 451–468 (1996)Jumisko-Pyykkö, S., Weitzel, M., Strohmeier, D.: Designing for user experience: what to expect from mobile 3D TV and video? In: Proceedings of the 1st International Conference on Designing Interactive User Experiences for TV and Video (UXTV ’08). Mountain View, CA, USA (2008)Goldstein, E.B.: Sensation and Perception. Wadsworth Publishing, Belmont (2002)Jumisko-Pyykkö, S.: User-Centered Quality of Experience and its Evaluation Methods for Mobile Television. Ph.D. thesis, Tampere University of Technology (2011)Hollier, M.P., Rimell, A.N., Hands, D.S., Voelcker, R.M.: Multi-modal perception. BT Technol. J. 17(1), 35–46 (1999)Hands, D.S.: A basic multimedia quality model. IEEE Trans. Multimed. 6(6), 806–816 (2004)Beerends, J.G., de Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. 47(5), 355–362 (1999)You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkins, A.: Perceptual-based quality assessment for audio-visual services: a survey. Signal Process. Image Commun. 25, 482–501 (2010)Jones, C., Atkinson, D.J.: Development of opinion-based audiovisual quality models for desktop video-teleconferencing. In: Proceedings of the 6th International Workshop on Quality of Services (IWQoS 98). Napa Valley, CA (1998)Jumisko-Pyykkö, S., Strohmeier, D.: Cognitive styles and visual quality. In: Proceedings of SPIE 8667, Multimedia Content and Mobile Devices (2013)Nixon, N.F., Spitz, L.: The diction of auditory visual desynchrony. Perception 9, 719–721 (1980)Belmudez, B., Moeller, S., Lewcio, B., Raake, A., Mehmood, A.: Audio and video channel impact on perceived audio-visual quality in different interactive contexts. In: IEEE International Workshop on Multimedia Signal Processing, 2009. (MMSP ’09) (2009)Reiter, U.: Subjective assessment of the optimum number of loudspeaker channels in audio-visual applications using large screens. In: Proceedings of the 28th AES International Conference (2006)Steinke, G.: Surround-sound: Relations of listening and viewing configurations. In: Proceedings of the 116th AES Convention. Berlin, Germany. Paper 6019 (2004)Recommendation ITU-R BS.1116-1: Methods for Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems (1994)Recommendation ITU-R: 710-4: Subjective Assessment Methods for Image Quality in High-Definition Television. International Telecommunications Union, Geneva (1998)Recommendation ITU-R 500: Methodology for the Subjective Assessment of the Quality of Television Pictures. International Telecommunications Union, Geneva (2002)Recommendation ITU-R BS.1284-1: General Methods for the Subjective Assessment of Sound Quality. International Telecommunications Union, Geneva (2003)Recommendation ITU-R BT.1128-2: Subjective Assessment of Conventional Television Systems. International Telecommunications Union, Geneva (1997)Drewery, J.O., Salmon, R.A.: Tests of Visual Acuity to Determine the Resolution Required of a Television Transmission System. BBC R& D White Paper. WHP, 092 (2004)Recommendation ITU-R BS.1283: Subjective Assessment of Sound Quality—A Guide to Existing Recommendations. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1285: Pre-Selection Methods for the Subjective Assessment of Small Impairments in Audio Systems. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1286: Methods for the Subjective Assessment of Audio Systems with Accompanying Picture. International Telecommunications Union, Geneva (1998)Hershey, J., Movellan, J.: Audio-vision: using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems, pp. 813–819. MIT Press, Cambridge (1999)Thurston, L.L.: A law of comparative judgment. Psychol. Rev. 101(2), 266–270 (1994)One TV Year in the World (2011 issue). Tech. rep., Mediametrie (2011)Neuraltm Upmix by DTS User Guide. DTS Document Number 9302J70400B (2010)Nuendo 3: Operation Manual. Steinberg Media Technologies, GmbH (2005)Pulkki, V.: Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. 45(6), 456–566 (1997)Dolby headphone webpage. http://www.dolby.com/us/en/consumer/technology/home-theater/dolby-headphone.html (2012). Accessed 07 May 2012H3D Binaural Spatializer Manual. Longcat Audio Technologies SARL (2011)Blauert, J.: Spatial hearing. In: The Psychophysics of Human Sound Localization. MIT Press, Cambridge (1996)Kramer, C.Y.: Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12, 307–310 (1956)Mosteller, F.: Remarks on the method of paired comparisons: the least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16(1), 3–9 (1951)Tsukida, K., Gupta, M.R.: How to Analyze Paired Comparison Data. Tech. rep., Department of Electrical Engineering, University of Washington (2011

    The quality of experience of next generation audio :exploring system, context and human influence factors

    Get PDF
    PhD ThesisThe next generation of audio reproduction technology has the potential to deliver immersive and personalised experiences to the user; multichannel with-height loudspeaker arrays and binaural techniques offer 3D audio experiences, whereas objectbased techniques offer possibilities of adapting content to suit the system, context and user. A fundamental process in the advancement of such technology is perceptual evaluation. It is crucial to understand how listeners perceive new technology in order to drive future developments. This thesis explores the experience provided by next generation audio technology by taking a quality of experience (QoE) approach to evaluation. System, context and human factors all influence QoE and in this thesis three case studies are presented to explore the role of these categories of influence factors (IFs) in the context of next generation audio evaluation. Furthermore, these case studies explore suitable methods and approaches for the evaluation of the QoE of next generation audio with respect to its various IFs. Specific contributions delivered from these individual studies include a subjective comparison between soundbar and discrete surround sound technology, the application of the Open Profiling of Quality method to the field of audio evaluation, an understanding of both how and why environmental noise influences preferred audio object balance, an understanding of how the influence of technical audio quality on overall listening experience is related to a range of psychographic variables and an assessment of the impact of binaural processing on overall listening experience. When considering these studies as a whole, the research presented here contributes the thesis that to effectively evaluate the perceived quality of next generation audio, a QoE mindset should be taken that considers system, context and human IFs.Engineering and Physical Sciences Research Council (EPSRC) and the British Broadcasting Corporation Research & Development department (BBC R&D

    Influence of visual stimuli on perceptual attributes of spatial audio

    Get PDF
    Reproduced audio is often accompanied with visuals (i.e. television, virtual reality, gaming, and cinema). However, the audio technology for these systems is often researched and evaluated in isolation from the visual component. Previous research indicates that the auditory and visual modalities are not processed separately. For example, visual stimuli can influence ratings of audio quality and vice versa. This paper presents an experiment to investigate the influence of visual stimuli on a set of attributes relevant to the perception of spatial audio. Eighteen participants took part in a paired comparison listening test where they were asked to judge pairs of stimuli rendered to fourteen-, five-, and two-channel systems using ten perceptual attributes. The stimuli were presented in audio only and audio-visual conditions. The results show a significant and large main effect of the loudspeaker configuration for all the tested attributes other than overall spectral balance and depth of field. The effect of visual stimuli was found to be small and significant for the attributes realism, sense of space, and spatial clarity. These results suggest that evaluations of audio-visual technologies aiming to evoke a sense of realism or presence should consider the influence of both the audio and visual modalities

    Towards predicting immersion in surround sound music reproduction from sound field features

    Get PDF
    When evaluating surround sound loudspeaker reproduction, perceptual effects are commonly analyzed in relationship to different loudspeaker configurations. The presented work contributes to this by modeling perceptual effects based on acoustic properties of various reproduction formats. A model of immersion in music listening is derived from the results of an experimental study analyzing the psychological construct of immersive music experience. The proposed approach is evaluated with respect to the relationship between immersion ratings and sound field features obtained from re-recordings of the stimuli using a spherical microphone array at the listening position. Spatial sound field parameters such as inter-aural cross-correlation (IACC), diffuseness and directivity are found to be of particular relevance. Further, immersion is observed to reach a point of saturation with greater numbers of loudspeakers, which is confirmed to be predictable from the physical properties of the sound field. Although effects related to participants and musical pieces outweigh the impact of sound field features, the proposed approach is found to be suitable for predicting population-average ratings, i.e. immersion experienced by an average listener for unknown content. The proposed method could complement existing research on multichannel loudspeaker reproduction by establishing a more generalizable framework independent of particular speaker setups

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    Qualitative evaluation of media device orchestration for immersive spatial audio reproduction

    Get PDF
    The challenge of installing and setting up dedicated spatial audio systems can make it difficult to deliver immersive listening experiences to the general public. However, the proliferation of smart mobile devices and the rise of the Internet of Things mean that there are increasing numbers of connected devices capable of producing audio in the home. \Media device orchestration" (MDO) is the concept of utilizing an ad hoc set of devices to deliver or augment a media experience. In this paper, the concept is evaluated by implementing MDO for augmented spatial audio reproduction using object-based audio with semantic metadata. A thematic analysis of positive and negative listener comments about the system revealed three main categories of response: perceptual, technical, and content-dependent aspects. MDO performed particularly well in terms of immersion/envelopment, but the quality of listening experience was partly dependent on loudspeaker quality and listener position. Suggestions for further development based on these categories are given
    • …