30 research outputs found

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    Optimized Fundamental Signal Processing Operations for Energy Minimization on Heterogeneous Mobile Devices

    Get PDF
    [EN] Numerous signal processing applications are emerging on both mobile and high-performance computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. The increasingly heterogeneous power-versus-performance profile of modern hardware introduces new opportunities for energy savings as well as challenges. In this line, recent systems-on-chip (SoC) composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity while partially retaining the appealing low power consumption of embedded systems. This paper analyzes the potential of these new hardware systems to accelerate applications that involve a large number of floating-point arithmetic operations mainly in the form of convolutions. To assess the performance, a headphone-based spatial audio application for mobile devices based on a Samsung Exynos 5422 SoC has been developed. We discuss different implementations and analyze the tradeoffs between performance and energy efficiency for different scenarios and configurations. Our experimental results reveal that we can extend the battery lifetime of a device featuring such an architecture by a 238% by properly configuring and leveraging the computational resources.This work was supported by the Spanish Ministerio de Economia y Competitividad projects under Grant TIN2014-53495-R and Grant TEC2015-67387-C4-1-R, in part by the University Project UJI-B2016-20, in part by the Project PROMETEOII/2014/003. The work of J. A. Belloch was supported by the GVA Post-Doctoral Contract under Grant APOSTD/2016/069. This paper was recommended by Associate Editor Y. Ha.Belloch Rodríguez, JA.; Badia Contelles, JM.; Igual Peña, FD.; Gonzalez, A.; Quintana Ortí, ES. (2017). Optimized Fundamental Signal Processing Operations for Energy Minimization on Heterogeneous Mobile Devices. IEEE Transactions on Circuits and Systems I Regular Papers. 65(5):1614-1627. https://doi.org/10.1109/TCSI.2017.2761909S1614162765

    On the performance of a GPU-based SoC in a distributed spatial audio system

    Get PDF
    [EN] Many current system-on-chip (SoC) devices are composed of low-power multicore processors combined with a small graphics accelerator (or GPU) offering a trade-off between computational capacity and low-power consumption. In this context, spatial audio methods such as wave field synthesis (WFS) can benefit from a distributed system composed of several SoCs that collaborate to tackle the high computational cost of rendering virtual sound sources. This paper aims at evaluating important aspects dealing with a distributed WFS implementation that runs over a network of Jetson Nano boards composed of embedded GPU-based SoCs: computational performance, energy efficiency, and synchronization issues. Our results show that the maximum efficiency is obtained when the WFS system operates the GPU frequency at 691.2 MHz, achieving 11 sources-per-Watt. Synchronization experiments using the NTP protocol show that the maximum initial delay of 10 ms between nodes does not prevent us from achieving high spatial sound quality.This work has been supported by the Spanish Government through TIN2017-82972-R, ESP2015-68245-C4-1-P, the Valencian Regional Government through PROMETEO/2019/109 and the Universitat Jaume I through UJI-B2019-36.Belloch, JA.; Badía, JM.; Larios, DF.; Personal, E.; Ferrer Contreras, M.; Fuster Criado, L.; Lupoiu, M.... (2021). On the performance of a GPU-based SoC in a distributed spatial audio system. The Journal of Supercomputing (Online). 77(7):6920-6935. https://doi.org/10.1007/s11227-020-03577-46920693577

    Beamforming filtering with real-time constraints on mobile embedded devices

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Nowadays Tables and Smart phones are equipped with low power processor. Some of them, like the NVIDIA Tegra SoC, also come with a GPU integrated so that both, the CPU and the GPU have access directly to the same RAM memory. In another vein, one the main limitations of microphone array algorithms for audio processing is the high computational cost required to reproduce real acoustics environments when real-time signal processing is absolutely required. One of these algorithms is the Beamforming Algorithm, which is used to recover acoustic signals from their observations when they are corrupted by noise, reverberation and other interfering signals. In order to achieve real-time processing executing this algorithm we have employed high performance libraries such as OPENBLAS, LAPACK, CUBLAS, PLASMA and MAGMA, and a particular tune programming for these mobile devices.European Cooperation in Science and Technology. COS

    Accelerating multi-channel filtering of audio signal on ARM processors

    Get PDF
    The researchers from Universitat Jaume I are supported by the CICYT projects TIN2014-53495-R and TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDER. The authors from the Universitat Politècnica de València are supported by projects TEC2015-67387-C4-1-R and PROMETEOII/2014/003. This work was also supported from the European Union FEDER (CAPAP-H5 network TIN2014-53522-REDT)

    GPU Implementation of multichannel adaptive algorithms for local active noise control

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksMultichannel active noise control (ANC) systems are commonly based on adaptive signal processing algorithms that require high computational capacity, which constrains their practical implementation. Graphics Processing Units (GPUs) are well known for their potential for highly parallel data processing. Therefore, GPUs seem to be a suitable platform for multichannel scenarios. However, efficient use of parallel computation in the adaptive filtering context is not straightforward due to the feedback loops. This paper compares two GPU implementations of a multichannel feedforward local ANC system working as a real-time prototype. Both GPU implementations are based on the filtered-x Least Mean Square algorithms; one is based on the conventional filtered-x scheme and the other is based on the modified filtered-x scheme. Details regarding the parallelization of the algorithms are given. Finally, experimental results are presented to compare the performance of both multichannel ANC GPU implementations. The results show the usefulness of many-core devices for developing versatile, scalable, and low-cost multichannel ANC systems.This work was supported by the European Union ERDF and Spanish Government under Project TEC2012-38142-C04, and Generalitat Valenciana under Project PROMETEO/2009/013. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Thushara D. Abhayapala.Lorente Giner, J.; Ferrer Contreras, M.; Diego Antón, MD.; Gonzalez, A. (2014). GPU Implementation of multichannel adaptive algorithms for local active noise control. IEEE Transactions on Audio, Speech and Language Processing. 22(11):1624-1635. https://doi.org/10.1109/TASLP.2014.2344852S16241635221

    Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs

    Full text link
    [EN] Multichannel acoustic signal processing has undergone major development in recent years due to the increased com- plexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A gen- eral scenario that appears in this context is the immersive reproduction of binaural audio without the use of headphones, which requires the use of a crosstalk canceler. However, generalized crosstalk cancellation and equalization (GCCE) requires high com- puting capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient fil- tering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.This work has been partially funded by Spanish Ministerio de Ciencia e Innovacion TEC2009-13741, Generalitat Valenciana PROMETEO 2009/2013 and GV/2010/027, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11).Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2013). Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs. Integrated Computer-Aided Engineering. 20(2):169-182. https://doi.org/10.3233/ICA-130422S16918220

    Surround by Sound: A Review of Spatial Audio Recording and Reproduction

    Get PDF
    In this article, a systematic overview of various recording and reproduction techniques for spatial audio is presented. While binaural recording and rendering is designed to resemble the human two-ear auditory system and reproduce sounds specifically for a listener’s two ears, soundfield recording and reproduction using a large number of microphones and loudspeakers replicate an acoustic scene within a region. These two fundamentally different types of techniques are discussed in the paper. A recent popular area, multi-zone reproduction, is also briefly reviewed in the paper. The paper is concluded with a discussion of the current state of the field and open problemsThe authors acknowledge National Natural Science Foundation of China (NSFC) No. 61671380 and Australian Research Council Discovery Scheme DE 150100363

    An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia

    Get PDF
    The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes a scalable parallel implementation of a real-time binaural audio engine for GPU-equipped mobile devices. The engine is based on a set of head-related transfer functions (HRTFs) modelled with a parametric parallel structure, allowing efficient synthesis and interpolation while reducing the size required for HRTF data storage. Several strategies to optimize the GPU implementation are evaluated over a well-known kind of processor present in a wide range of mobile devices. In this context, we analyze both the energy consumption and real-time capabilities of the system by exploring different GPU and CPU configuration alternatives. Moreover, the implementation has been conducted using the OpenCL framework, guarantying the portability of the code

    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016)

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The PhD Symposium was a very good opportunity for the young researchers to share information and knowledge, to present their current research, and to discuss topics with other students in order to look for synergies and common research topics. The idea was very successful and the assessment made by the PhD Student was very good. It also helped to achieve one of the major goals of the NESUS Action: to establish an open European research network targeting sustainable solutions for ultrascale computing aiming at cross fertilization among HPC, large scale distributed systems, and big data management, training, contributing to glue disparate researchers working across different areas and provide a meeting ground for researchers in these separate areas to exchange ideas, to identify synergies, and to pursue common activities in research topics such as sustainable software solutions (applications and system software stack), data management, energy efficiency, and resilience.European Cooperation in Science and Technology. COS
    corecore