Search CORE

1,625 research outputs found

Real-time massive convolution for audio applications on GPU

Author: Alberto Gonzalez
Antonio M. Vidal
AV Oppenheim
B Cowan
F. J. Martínez-Zaldívar
JA Belloch
Jose A. Belloch
S Spors
SS Soliman
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2011
Field of study

[EN] Massive convolution is the basic operation in multichannel acoustic signal processing. This field has experienced a major development in recent years. One reason for this has been the increase in the number of sound sources used in playback applications available to users. Another reason is the growing need to incorporate new effects and to improve the hearing experience. Massive convolution requires high computing capacity. GPUs offer the possibility of parallelizing these operations. This allows us to obtain the processing result in much shorter time and to free up CPU resources. One important aspect lies in the possibility of overlapping the transfer of data from CPU to GPU and vice versa with the computation, in order to carry out real-time applications. Thus, a synthesis of 3D sound scenes could be achieved with only a peer-to-peer music streaming environment using a simple GPU in your computer, while the CPU in the computer is being used for other tasks. Nowadays, these effects are obtained in theaters or funfairs at a very high cost, requiring a large quantity of resources. Thus, our work focuses on two mains points: to describe an efficient massive convolution implementation and to incorporate this task to real-time multichannel-sound applications. © 2011 Springer Science+Business Media, LLC.This work was partially supported by the Spanish Ministerio de Ciencia e Innovacion (Projects TIN2008-06570-C04-02 and TEC2009-13741), Universidad Politecnica de Valencia through PAID-05-09 and Generalitat Valenciana through project PROMETEO/2009/2013Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2011). Real-time massive convolution for audio applications on GPU. Journal of Supercomputing. 58(3):449-457. https://doi.org/10.1007/s11227-011-0610-8S449457583Spors S, Rabenstein R, Herbordt W (2007) Active listening room compensation for massive multichannel sound reproduction system using wave-domain adaptive filtering. J Acoust Soc Am 122:354–369Huang Y, Benesty J, Chen J (2008) Generalized crosstalk cancellation and equalization using multiple loudspeakers for 3D sound reproduction at the ears of multiple listeners. In: IEEE int conference on acoustics, speech and signal processing, Las Vegas, USA, pp 405–408Cowan B, Kapralos B (2008) Spatial sound for video games and virtual environments utilizing real-time GPU-based convolution. In: Proceedings of the ACM FuturePlay 2008 international conference on the future of game design and technology, Toronto, Ontario, Canada, November 3–5Belloch JA, Vidal AM, Martinez-Zaldivar FJ, Gonzalez A (2010) Multichannel acoustic signal processing on GPU. In: Proceedings of the 10th international conference on computational and mathematical methods in science and engineering, vol 1. Almeria, Spain, June 26–30, pp 181–187Cowan B, Kapralos B (2009) GPU-based one-dimensional convolution for real-time spatial sound generation. Sch J 3(5)Soliman SS, Mandyam DS, Srinath MD (1997) Continuous and discrete signals and systems. Prentice Hall, New YorkOppenheim AV, Willsky AS, Hamid Nawab S (1996) Signals and systems. Prentice Hall, New YorkopenGL: http://www.opengl.org/MKL library: http://software.intel.com/en-us/intel-mkl/MKL library: http://software.intel.com/en-us/intel-ipp/CUFFT library: http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/CUFFT_Library_3.1.pdfCUDA Toolkit 3.1: http://developer.nvidia.com/object/cuda_3_1_downloads.htmlCUDA Toolkit 3.2: http://developer.nvidia.com/object/cuda_3_1_downloads.htmlDatasheet of AC’97 SoundMAX Codec: http://www.xilinx.com/products/boards/ml505/datasheets/87560554AD1981B_c.pd

Crossref

RiuNet

PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

Author: Belloch Rodríguez José Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 06/10/2014
Field of study

Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

Crossref

RiuNet

Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs

Author: Belloch Rodríguez José Antonio
Gonzalez Alberto
Martínez Zaldívar Francisco José
Vidal Maciá Antonio Manuel
Publication venue: 'IOS Press'
Publication date: 01/03/2013
Field of study

[EN] Multichannel acoustic signal processing has undergone major development in recent years due to the increased com- plexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A gen- eral scenario that appears in this context is the immersive reproduction of binaural audio without the use of headphones, which requires the use of a crosstalk canceler. However, generalized crosstalk cancellation and equalization (GCCE) requires high com- puting capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient fil- tering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.This work has been partially funded by Spanish Ministerio de Ciencia e Innovacion TEC2009-13741, Generalitat Valenciana PROMETEO 2009/2013 and GV/2010/027, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11).Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2013). Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs. Integrated Computer-Aided Engineering. 20(2):169-182. https://doi.org/10.3233/ICA-130422S16918220

RiuNet

Desarrollo de una aplicación de audio multicanal utilizando el paralelismo de las GPU

Author: Belloch Rodríguez José Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 28/11/2011
Field of study

En este trabajo se han analizado las prestaciones que ofrece una GPU ante una aplicación de audio multicanal, aplicando dicho análisis a la implementación un Cancelador Crosstalk que funciona en tiempo real y cuyo código es ejecutado sobre una GPU de un computador personal portatil.Belloch Rodríguez, JA. (2010). Desarrollo de una aplicación de audio multicanal utilizando el paralelismo de las GPU. http://hdl.handle.net/10251/13644Archivo delegad

RiuNet

FourierPIM: High-Throughput In-Memory Fast Fourier Transform and Polynomial Multiplication

Author: Boneh Yahav
Gazit Gonen
Kvatinsky Shahar
Leitersdorf Orian
Ronen Ronny
Publication venue: 'Elsevier BV'
Publication date: 05/04/2023
Field of study

The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity from the naive O(n^2) to O(n log n), and recent works have sought further acceleration through parallel architectures such as GPUs. Unfortunately, accelerators such as GPUs cannot exploit their full computing capabilities as memory access becomes the bottleneck. Therefore, this paper accelerates the FFT algorithm using digital Processing-in-Memory (PIM) architectures that shift computation into the memory by exploiting physical devices capable of storage and logic (e.g., memristors). We propose an O(log n) in-memory FFT algorithm that can also be performed in parallel across multiple arrays for high-throughput batched execution, supporting both fixed-point and floating-point numbers. Through the convolution theorem, we extend this algorithm to O(log n) polynomial multiplication - a fundamental task for applications such as cryptography. We evaluate FourierPIM on a publicly-available cycle-accurate simulator that verifies both correctness and performance, and demonstrate 5-15x throughput and 4-13x energy improvement over the NVIDIA cuFFT library on state-of-the-art GPUs for FFT and polynomial multiplication

arXiv.org e-Print Archive

Directory of Open Access Journals

Application of Multi-core and GPU Architectures on Signal Processing: Case Studies

Author: Alonso-Jordá Pedro
BELLOCH JOSE A.
De Diego María
Ferrer Miguel
García Víctor M.
González Alberto
Lorente Jorge
Martínez Francisco J.
Piñero Gema
Quintana-Orti Enrique S.
Remón Gómez Alfredo
Roger Sandra
Roig Carles
Vidal Antonio M.
Publication venue: Universidad Politécnica de Valencia
Publication date: 01/01/2010
Field of study

In this article part of the techniques and developments we are carrying out within the INCO2 group are reported. Results follow the interdisciplinary approach with which we tackle signal processing applications. Chosen case studies show different stages of development: We present algorithms already completed which are being used in practical applications as well as new ideas that may represent a starting point, and which are expected to deliver good results in a short and medium term

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Visual Analysis Algorithms for Embedded Systems

Author: RIZVI SYED TAHIR HUSSAIN
Publication venue: country:Italy
Publication date: 17/05/2018
Field of study

The main contribution of this thesis is the design and development of an optimized framework to realize the deep neural classifiers on the embedded platforms. Deep convolutional networks exhibit unmatched performance in image classification. However, these deep classifiers demand huge computational power and memory storage. That is an issue on embedded devices due to limited onboard resources. The computational demand of neural networks mainly stems from the convolutional layers. A significant improvement in performance can be obtained by reducing the computational complexity of these convolutional layers, making them realizable on embedded platforms. In this thesis, we proposed a CUDA (Compute Unified Device Architecture)-based accelerated scheme to realize the deep architectures on the embedded platforms by exploiting the already trained networks. All required functions and layers to replicate the trained neural networks were implemented and accelerated using concurrent resources of embedded GPU. Performance of our CUDA-based proposed scheme was significantly improved by performing convolutions in the transform domain. This matrix multiplication based convolution was also compared with the traditional approach to analyze the improvement in inference performance. The second part of this thesis focused on the optimization of the proposed framework. The flow of our CUDA-based framework was optimized using unified memory scheme and hardware-dependent utilization of computational resources. The proposed flow was evaluated over three different image classification networks on Jetson TX1 embedded board and Nvidia Shield K1 tablet. The performance of proposed GPU-only flow was compared with its sequential and heterogeneous versions. The results showed that the proposed scheme brought the higher performance and enabled the real-time image classification on the embedded platforms with lesser storage requirements. These results motivated us towards the realization of useful real-time classification and recognition problems on the embedded platforms. Finally, we utilized the proposed framework to realize the neural network-based automatic license plate recognition (ALPR) system on a mobile platform. This highly-precise and computationally demanding system was deployed by simplifying the flow of trained deep architecture developed for powerful desktop and server environments. A comparative analysis of computational complexity, recognition accuracy and inference performance was performed

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Deep Feature-based Face Detection on Mobile Devices

Author: Chellappa Rama
Patel Vishal M.
Sarkar Sayantan
Publication venue
Publication date: 15/02/2016
Field of study

We propose a deep feature-based face detector for mobile devices to detect user's face acquired by the front facing camera. The proposed method is able to detect faces in images containing extreme pose and illumination variations as well as partial faces. The main challenge in developing deep feature-based algorithms for mobile devices is the constrained nature of the mobile platform and the non-availability of CUDA enabled GPUs on such devices. Our implementation takes into account the special nature of the images captured by the front-facing camera of mobile devices and exploits the GPUs present in mobile devices without CUDA-based frameorks, to meet these challenges.Comment: ISBA 201

arXiv.org e-Print Archive

Crossref