Search CORE

10,388 research outputs found

PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

Author: Belloch Rodríguez José Antonio
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 06/10/2014
Field of study

Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

Crossref

RiuNet

On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone array

Author: Alberto Gonzalez
Allen
Antonio M. Vidal
Belloch
Bilbao
Bradford
Brandstein
Calderoni
Chen
Cobos
Cook
Dazevedo
DiBiase
Huang
Jose A. Belloch
Kloss
Knapp
Kodagoda
Kuttruff
Liu
Lorente
Madhu
Marti
Matsumoto
Maximo Cobos
Peruffo Minotto
Savioja
Seewald
Vanek
Wang
Xu
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. Graphics Processing Units (GPUs) are highly parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. Emerging GPUs offer multiple parallelism levels; however, properly managing their computational resources becomes a very challenging task. In fact, management issues become even more difficult when multiple GPUs are involved, adding one more level of parallelism. In this paper, the performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system. The paper evaluates and points out the influence that the number of microphones and the available computational resources have in the overall system performance. Several acoustic environments are considered to show the impact that noise and reverberation have in the localization accuracy and how the use of massive microphone systems combined with parallelized GPU algorithms can help to mitigate substantially adverse acoustic effects. In this context, the proposed implementation is able to work in real time with high-resolution spatial grids and using up to 48 microphones. These results confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.This work has been partially funded by the Spanish Ministerio de Economia y Competitividad (TEC2009-13741, TEC2012-38142-C04-01, and TEC2012-37945-C02-02), Generalitat Valenciana PROMETEO 2009/2013, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11 and PAID-05-12).Belloch Rodríguez, JA.; Gonzalez, A.; Vidal Maciá, AM.; Cobos Serrano, M. (2015). On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone array. Expert Systems with Applications. 42(13):5607-5620. https://doi.org/10.1016/j.eswa.2015.02.056S56075620421

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

RiuNet

Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL

Author: BADÍA CONTELLES JOSÉ MANUEL
Belloch Rodríguez José Antonio
Cobos Serrano Máximo
IGUAL PEÑA FRANCISCO DANIEL
Quintana-Ortí Enrique S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

[EN] The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a large number of acoustic applications such as automatic camera steering systems, human-machine interaction, video gaming and audio surveillance. SPR-PHAT implementations require to handle a high number of signals coming from a microphone array and a huge search grid that influences the localization accuracy of the system. In this context, high performance in the localization process can only be achieved by using massively parallel computational resources. Different types of multi-core machines based either on multiple CPUs or on GPUs are commonly employed in diverse fields of science for accelerating a number of applications, mainly using OpenMP and CUDA as programming frameworks, respectively. This implies the development of multiple source codes which limits the portability and application possibilities. On the contrary, OpenCL has emerged as an open standard for parallel programming that is nowadays supported by a wide range of architectures. In this work, we evaluate an OpenCL-based implementations of the SRP-PHAT algorithm in two state-of-the-art CPU and GPU platforms. Results demonstrate that OpenCL achieves close-to-CUDA performance in GPU (considered as upper bound) and outperforms in most of the CPU configurations based on OpenMP.This work has been supported by the postdoctoral fellowship from Generalitat Valenciana APOSTD/2016/069, the Spanish Government through TIN2014-53495-R, TIN2015-65277-R and BIA2016-76957-C3-1-R, and the Universidad Jaume I Project UJI-B2016-20.Badía Contelles, JM.; Belloch Rodríguez, JA.; Cobos Serrano, M.; Igual Peña, FD.; Quintana-Ortí, ES. (2019). Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. The Journal of Supercomputing. 75(3):1284-1297. https://doi.org/10.1007/s11227-018-2422-6S12841297753Brandstein M, Ward D (eds) (2001) Microphone arrays. Springer, BerlinKnapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. Trans Acoust Speech Signal Process 24:320–327Cobos M, Antonacci F, Alexandridis A, Mouchtaris A, Lee B (2017) A survey of sound source localization methods in wireless acoustic sensor networks. Wirel Commun Mobile Comput 2017, article ID 3956282DiBiase JH (2000) A high accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. Ph.D. dissertation, Brown University, ProvidenceLee CH (2017) Location-aware speakers for the virtual reality environments. IEEE Access 5:2636–2640Altera Corporation (2013) Implementing FPGA design with the OpenCL standard. https://www.altera.com/en_US/pdfs/literature/wp/wp-01173-opencl.pdf . Accessed 21 May 2018Savioja L, Välimäki V, Smith JO (2011) Audio signal processing using graphics processing units. J Audio Eng Soc 59(1–2):3–19Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Belloch JA, Gonzalez A, Quintana-Ortí ES, Ferrer M, Välimäki V (2017) GPU-based dynamic wave field synthesis using fractional delay filters and room compensation. IEEE/ACM Trans Audio Speech Lang Process 25(2):435–447Peruffo Minotto V, Rosito Jung C, Gonzaga da Silveira L, Lee B (2013) GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm. Int J High Perform Comput Appl 27(3):291–306Belloch JA, Gonzalez A, Vidal AM, Cobos M (2015) On the performance of multi-gpu-based expert systems for acoustic localization involving massive microphone arrays. Expert Syst Appl 42(13):5607–5620Seewald LC, Gonzaga L, Veronez MR, Minotto VP, Jung CR (2014) Combining srp-phat and two kinects for 3d sound source localization. Expert Syst Appl 41(16):0957–4174Theodoropoulos D, Kuzmanov G, Gaydadjiev G (2011) Multi-core platforms for beamforming and wave field synthesis. IEEE Trans Multimedia 3(2):235–245Belloch JA, Badia MJ, Igual FD, Quintana-Ortí E, Cobos M (2017) Evaluating sound source localization on multi and many-core platform. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, vol 1. Rota, pp 279–286Cobos M, Marti A, Lopez JJ (2011) A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Process Lett 18(1):71–74Marti A, Cobos M, Lopez JJ (2013) A steered response power iterative method for high-accuracy acoustic source location. J Acoust Soc Am 134(4):2627–2630Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231 (special issue on “Program generation, optimization, and platform adaptation”)NVIDIA cuFFT library user’s guide (2018). https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf . Accessed 21 May 2018OpenCL fast Fourier transforms. http://clmathlibraries.github.io/clFFT . Accessed 21 May 2018Scarpino M (2012) OpenCL in action: how to accelerate graphics and computation. Mannin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

RiuNet

Universidad Carlos III de Madrid e-Archivo

Design and implementation of a multi-octave-band audio camera for realtime diagnosis

Author: Challande Pascal
Marchal Jacques
Marchiano Régis
Moingeon Hélène
Ollivier François
Vanwynsberghe Charles
Publication venue: 'Elsevier BV'
Publication date: 01/03/2015
Field of study

Noise pollution investigation takes advantage of two common methods of diagnosis: measurement using a Sound Level Meter and acoustical imaging. The former enables a detailed analysis of the surrounding noise spectrum whereas the latter is rather used for source localization. Both approaches complete each other, and merging them into a unique system, working in realtime, would offer new possibilities of dynamic diagnosis. This paper describes the design of a complete system for this purpose: imaging in realtime the acoustic field at different octave bands, with a convenient device. The acoustic field is sampled in time and space using an array of MEMS microphones. This recent technology enables a compact and fully digital design of the system. However, performing realtime imaging with resource-intensive algorithm on a large amount of measured data confronts with a technical challenge. This is overcome by executing the whole process on a Graphic Processing Unit, which has recently become an attractive device for parallel computing

arXiv.org e-Print Archive

On binaural spatialization and the use of GPGPU for audio processing

Author: Mauro Davide Andrea, PhD
Publication venue: Marshall Digital Scholar
Publication date: 01/01/2012
Field of study

3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques. We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances. General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches. The second aspect is related to the implementation of an augmented reality system having in mind an “off the shelf” system available to most home computers without the need of specialized hardware. A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene. The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed. Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are propose

Marshall University

Ambisonic audio system optimization using a HPC cluster

Author: Mair Quentin
Moore David
Wakefield Jonathan
Publication venue
Publication date: 01/01/2011
Field of study

ResearchOnline@GCU

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN