18 research outputs found

    3D sound in the telepresence project BEAMING

    Get PDF

    Three-dimensional point-cloud room model for room acoustics simulations

    Get PDF

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs

    Full text link
    [EN] Multichannel acoustic signal processing has undergone major development in recent years due to the increased com- plexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A gen- eral scenario that appears in this context is the immersive reproduction of binaural audio without the use of headphones, which requires the use of a crosstalk canceler. However, generalized crosstalk cancellation and equalization (GCCE) requires high com- puting capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient fil- tering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.This work has been partially funded by Spanish Ministerio de Ciencia e Innovacion TEC2009-13741, Generalitat Valenciana PROMETEO 2009/2013 and GV/2010/027, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11).Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2013). Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs. Integrated Computer-Aided Engineering. 20(2):169-182. https://doi.org/10.3233/ICA-130422S16918220

    Generator of Binaural Sounds Through LABVIEW and PURE DATA

    Get PDF
    Este articulo presenta el diseño e implementación de un generador de sonidos binaurales por medio de la plataforma grafica LABVIEW y el procesador de audio PURE DATA de modo que por medio de una interfaz, se pueda manipular estos determinados sonidos, es decir, se pueda variar sus características según lo desee el usuario, generando los diferentes tipos de ondas cerebrales gracias a la diferencia de frecuencias que se produce entre el audio izquierdo y el derecho. El generador no solo produce tonos binaurales puros o también denominados pulsos binaurales, sino que también se aplica en la realidad virtual, es decir, se desarrolló en la interfaz una opción que le permite al usuario interactuar con dichos sonidos por medio del movimiento de su cabeza, creando una sensación de realidad por medio de audio.This paper presents the design and implementation of a binaural sound generator by means of the LABVIEW graphic platform and the PURE DATA audio processor so that through an interface, these certain sounds can be manipulated, that is, their features as desired by the user, generating different types of brain waves thanks to the difference in frequencies that occurs between the left and right audio. The generator not only produces pure binaural tones or also called binaural pulses, but it is also applied in virtual reality, that is, an interface that allows the user to interact with said sounds through the movement of his head was developed in the interface. , creating a sense of reality through audio

    An audio-visual system for object-based audio : from recording to listening

    Get PDF
    Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

    Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres

    Full text link
    Immersive broadcasting applications have received a lot of attention in the last years. In this context, the development of advanced HDTV and 3DTV formats is being successfully adopted by the consumer market, having a strong impact in the way that traditional broadcasting contents are displayed to final users. Together with the above advances in video technology, multichannel spatial audio has also experienced a considerable impulse within the audiovisual industry. However, the need for specific production tools and loudspeaker setups corresponding to multiple competing audio formats seems to be an important factor affecting their adoption by the consumer community. Moreover, it is well-known that the perceived audio quality is highly influenced by the reproduction context, where the existing multimodal interaction between audio and video plays a very important role. This paper presents a formal evaluation of the perceived sound quality provided by several spatial audio formats accompanied with video in the context of television broadcasting. Stereo, advanced surround formats and 3D Binaural sound are evaluated considering a set of representative broadcasting contents (sports, movies, music and animation) to assess their impact on the perceptual attributes contemplated within the international recommendations.The Spanish Ministry of Economy and Competitiveness and FEDER supported this work under the projects TEC2012-37945- 1091 C02- 01/02.Cobos Serrano, M.; López Monfort, JJ.; Navarro Ruiz, JM.; Ramos Peinado, G. (2015). Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres. Multimedia Systems. 21(4):363-379. doi:10.1007/s00530-013-0340-2S363379214Apostolopoulos, J., Chou, P., Culbertson, B., Kalker, T., Trott, M., Wee, S.: The road to immersive communication. Proc. IEEE 100(4), 974–990 (2012). doi: 10.1109/JPROC.2011.2182069Huang, Y., Chen, J., Benesty, J.: Immersive audio schemes. IEEE Signal Process. Mag. 28(1), 20–32 (2011). doi: 10.1109/MSP.2010.938754Huynh-Thu, Q., Barkowsky, M., Le Callet, P.: The importance of visual attention in improving the 3D-TV viewing experience: overview and new perspectives. IEEE Trans. Broadcast. 57(2), 421–431 (2011). doi: 10.1109/TBC.2011.2128250Wang, K., Barkowsky, M., Brunnstrom, K., Sjostrom, M., Cousseau, R., Le Callet, P.: Perceived 3D TV transmission quality assessment: multi-laboratory results using absolute category rating on quality of experience scale. IEEE Trans. Broadcast. PP(99), 1 (2012). doi: 10.1109/TBC.2012.2191031Zhang, L., Vazquez, C., Knorr, S.: 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011). doi: 10.1109/TBC.2011.2122930Kyriakakis, C., Tsakalides, P., Holman, T.: Surrounded by sound. IEEE Signal Process. Mag. 16(1), 55–66 (1999). doi: 10.1109/79.743868Rumsey, F.: Spatial Audio. Focal Press, Waltham (2001)Eargle, J.M. (eds): AES Anthology: Stereophonic Techniques. Publications of the Audio Engineering Society, New York (1986)Holman, T.: 5.1 Surround Sound: Up and Running, 2nd edn. Focal Press, Waltham (2007)Steinke, G.: High definition surround sound with accompanying HD picture. In: Proceedings of the International Tonmeister Symposium. Vabaria (2005)Breebaart, J., Faller, C.: Spatial Audio Processing: MPEG Surround and Other Applications. Wiley, Chichester (2007)Holman, T.: Sound for Film and Television, 3rd edn. Focal Press, Waltham (2010)Theile, G.: HDTV sound systems: how many channels? In: Proceedings of the AES 9th International Conference. Detroit, Michigan (1991)Strohmeier, D., Jumisko-Pyykkö S.: How does my 3D video sound like?—impact of loudspeaker set-ups on audiovisual quality on mid-sized autostereoscopic display. In: Proceedings of the 3DTV Conference (3DTV-CON’08). Istanbul, Turkey (2008)Zielinski, S., Rumsey, F., Bech, S.: Subjective audio quality trade-offs in consumer multichannel audio-visual delivery systems. Part I: Effects of high frequency limitation. In: Proceedings of the AES 112th Convention. Munich, Germany (2002)Bech, S., Zacharov, N.: Perceptual audio evaluation—theory, method and application. John Wiley & Sons, Chichester (2006)Brotherton, M.D., Huynh-Thu, Q., Hands, D.S., Brunnstrom, K.: Subjective multimedia quality assessment. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89-A(11), 2920–2932 (2006)Jumisko-Pyykkö, S., Hakkinen, J., Nyman, G.: Experienced quality factors—qualitative evaluation approach to audiovisual quality. In: Proceedings of 19th SPIE Annual Symposium on Electronic Imaging. San Jose, California, USA (2007)Recommendation ITU-R BS.775-1: Multichannel Stereophonic Sound System With and Without Accompanying Picture. International Telecommunications Union, Geneva (1994)Recommendation ITU-T P.911: Subjective Audiovisual Quality Assessment Methods for Multimedia Applications. International Telecommunications Union, Geneva (1998)EBU Tech 3276-E: Supplement 1—Listening Conditions for the Assessment of Sound Programme Material: Multichannel Sound. International Telecommunications Union, Geneva (2004)Theile, G.: On the naturalness of two-channel stereo sound. J. Audio Eng. Soc. 39, 761–767 (1991)Dolby 7.1 home theater speaker guide. http://www.dolby.com/ . Accessed 05 July 2012Silzle, A., George, S., Habets, E.A.P., Bachmann, T.: Investigation on the quality of 3D sound reproduction. In: Proceedings of the International Conference on Spatial Audio (ICSA 2011). Detmold, Germany (2011)Hamasaki, K., Hiyama, K., Okumura, R.: The 22.2 multichannel sound system and its application. In: Proceedings of the 118th AES Convention. Barcelona, Spain (2005)Theile, G., Wittek, H.: Principles in surround recordings with height. In: Proceedings of the 130th AES Convention. London, UK (2011)Dolby ProLogic IIz. http://www.dolby.com/ . Accessed 05 July 2012Kim, S., Lee, Y.W., Pulkki, V.: New 10.2-channel vertical surround system (10.2-VSS); comparison study of perceived audio quality in various multichannel sound systems with height loudspeakers. In: Proceedings of the 129th AES Convention. San Francisco, USA (2010)Algazi, V.R., Duda, R.Q.: Headphone-based spatial sound. IEEE Signal Process. Mag. 28(1), 33–42 (2011)Moller, H., Sorensen, M.F., Jensen, C.B., Hammershoi, D.: Binaural technique: do we need individual recordings? J. Audio Eng. Soc. 44, 451–468 (1996)Jumisko-Pyykkö, S., Weitzel, M., Strohmeier, D.: Designing for user experience: what to expect from mobile 3D TV and video? In: Proceedings of the 1st International Conference on Designing Interactive User Experiences for TV and Video (UXTV ’08). Mountain View, CA, USA (2008)Goldstein, E.B.: Sensation and Perception. Wadsworth Publishing, Belmont (2002)Jumisko-Pyykkö, S.: User-Centered Quality of Experience and its Evaluation Methods for Mobile Television. Ph.D. thesis, Tampere University of Technology (2011)Hollier, M.P., Rimell, A.N., Hands, D.S., Voelcker, R.M.: Multi-modal perception. BT Technol. J. 17(1), 35–46 (1999)Hands, D.S.: A basic multimedia quality model. IEEE Trans. Multimed. 6(6), 806–816 (2004)Beerends, J.G., de Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. 47(5), 355–362 (1999)You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkins, A.: Perceptual-based quality assessment for audio-visual services: a survey. Signal Process. Image Commun. 25, 482–501 (2010)Jones, C., Atkinson, D.J.: Development of opinion-based audiovisual quality models for desktop video-teleconferencing. In: Proceedings of the 6th International Workshop on Quality of Services (IWQoS 98). Napa Valley, CA (1998)Jumisko-Pyykkö, S., Strohmeier, D.: Cognitive styles and visual quality. In: Proceedings of SPIE 8667, Multimedia Content and Mobile Devices (2013)Nixon, N.F., Spitz, L.: The diction of auditory visual desynchrony. Perception 9, 719–721 (1980)Belmudez, B., Moeller, S., Lewcio, B., Raake, A., Mehmood, A.: Audio and video channel impact on perceived audio-visual quality in different interactive contexts. In: IEEE International Workshop on Multimedia Signal Processing, 2009. (MMSP ’09) (2009)Reiter, U.: Subjective assessment of the optimum number of loudspeaker channels in audio-visual applications using large screens. In: Proceedings of the 28th AES International Conference (2006)Steinke, G.: Surround-sound: Relations of listening and viewing configurations. In: Proceedings of the 116th AES Convention. Berlin, Germany. Paper 6019 (2004)Recommendation ITU-R BS.1116-1: Methods for Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems (1994)Recommendation ITU-R: 710-4: Subjective Assessment Methods for Image Quality in High-Definition Television. International Telecommunications Union, Geneva (1998)Recommendation ITU-R 500: Methodology for the Subjective Assessment of the Quality of Television Pictures. International Telecommunications Union, Geneva (2002)Recommendation ITU-R BS.1284-1: General Methods for the Subjective Assessment of Sound Quality. International Telecommunications Union, Geneva (2003)Recommendation ITU-R BT.1128-2: Subjective Assessment of Conventional Television Systems. International Telecommunications Union, Geneva (1997)Drewery, J.O., Salmon, R.A.: Tests of Visual Acuity to Determine the Resolution Required of a Television Transmission System. BBC R& D White Paper. WHP, 092 (2004)Recommendation ITU-R BS.1283: Subjective Assessment of Sound Quality—A Guide to Existing Recommendations. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1285: Pre-Selection Methods for the Subjective Assessment of Small Impairments in Audio Systems. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1286: Methods for the Subjective Assessment of Audio Systems with Accompanying Picture. International Telecommunications Union, Geneva (1998)Hershey, J., Movellan, J.: Audio-vision: using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems, pp. 813–819. MIT Press, Cambridge (1999)Thurston, L.L.: A law of comparative judgment. Psychol. Rev. 101(2), 266–270 (1994)One TV Year in the World (2011 issue). Tech. rep., Mediametrie (2011)Neuraltm Upmix by DTS User Guide. DTS Document Number 9302J70400B (2010)Nuendo 3: Operation Manual. Steinberg Media Technologies, GmbH (2005)Pulkki, V.: Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. 45(6), 456–566 (1997)Dolby headphone webpage. http://www.dolby.com/us/en/consumer/technology/home-theater/dolby-headphone.html (2012). Accessed 07 May 2012H3D Binaural Spatializer Manual. Longcat Audio Technologies SARL (2011)Blauert, J.: Spatial hearing. In: The Psychophysics of Human Sound Localization. MIT Press, Cambridge (1996)Kramer, C.Y.: Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12, 307–310 (1956)Mosteller, F.: Remarks on the method of paired comparisons: the least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16(1), 3–9 (1951)Tsukida, K., Gupta, M.R.: How to Analyze Paired Comparison Data. Tech. rep., Department of Electrical Engineering, University of Washington (2011
    corecore