6 research outputs found

    Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL

    Get PDF
    [EN] The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a large number of acoustic applications such as automatic camera steering systems, human-machine interaction, video gaming and audio surveillance. SPR-PHAT implementations require to handle a high number of signals coming from a microphone array and a huge search grid that influences the localization accuracy of the system. In this context, high performance in the localization process can only be achieved by using massively parallel computational resources. Different types of multi-core machines based either on multiple CPUs or on GPUs are commonly employed in diverse fields of science for accelerating a number of applications, mainly using OpenMP and CUDA as programming frameworks, respectively. This implies the development of multiple source codes which limits the portability and application possibilities. On the contrary, OpenCL has emerged as an open standard for parallel programming that is nowadays supported by a wide range of architectures. In this work, we evaluate an OpenCL-based implementations of the SRP-PHAT algorithm in two state-of-the-art CPU and GPU platforms. Results demonstrate that OpenCL achieves close-to-CUDA performance in GPU (considered as upper bound) and outperforms in most of the CPU configurations based on OpenMP.This work has been supported by the postdoctoral fellowship from Generalitat Valenciana APOSTD/2016/069, the Spanish Government through TIN2014-53495-R, TIN2015-65277-R and BIA2016-76957-C3-1-R, and the Universidad Jaume I Project UJI-B2016-20.Badía Contelles, JM.; Belloch Rodríguez, JA.; Cobos Serrano, M.; Igual Peña, FD.; Quintana-Ortí, ES. (2019). Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. The Journal of Supercomputing. 75(3):1284-1297. https://doi.org/10.1007/s11227-018-2422-6S12841297753Brandstein M, Ward D (eds) (2001) Microphone arrays. Springer, BerlinKnapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. Trans Acoust Speech Signal Process 24:320–327Cobos M, Antonacci F, Alexandridis A, Mouchtaris A, Lee B (2017) A survey of sound source localization methods in wireless acoustic sensor networks. Wirel Commun Mobile Comput 2017, article ID 3956282DiBiase JH (2000) A high accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. Ph.D. dissertation, Brown University, ProvidenceLee CH (2017) Location-aware speakers for the virtual reality environments. IEEE Access 5:2636–2640Altera Corporation (2013) Implementing FPGA design with the OpenCL standard. https://www.altera.com/en_US/pdfs/literature/wp/wp-01173-opencl.pdf . Accessed 21 May 2018Savioja L, Välimäki V, Smith JO (2011) Audio signal processing using graphics processing units. J Audio Eng Soc 59(1–2):3–19Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Belloch JA, Gonzalez A, Quintana-Ortí ES, Ferrer M, Välimäki V (2017) GPU-based dynamic wave field synthesis using fractional delay filters and room compensation. IEEE/ACM Trans Audio Speech Lang Process 25(2):435–447Peruffo Minotto V, Rosito Jung C, Gonzaga da Silveira L, Lee B (2013) GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm. Int J High Perform Comput Appl 27(3):291–306Belloch JA, Gonzalez A, Vidal AM, Cobos M (2015) On the performance of multi-gpu-based expert systems for acoustic localization involving massive microphone arrays. Expert Syst Appl 42(13):5607–5620Seewald LC, Gonzaga L, Veronez MR, Minotto VP, Jung CR (2014) Combining srp-phat and two kinects for 3d sound source localization. Expert Syst Appl 41(16):0957–4174Theodoropoulos D, Kuzmanov G, Gaydadjiev G (2011) Multi-core platforms for beamforming and wave field synthesis. IEEE Trans Multimedia 3(2):235–245Belloch JA, Badia MJ, Igual FD, Quintana-Ortí E, Cobos M (2017) Evaluating sound source localization on multi and many-core platform. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, vol 1. Rota, pp 279–286Cobos M, Marti A, Lopez JJ (2011) A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Process Lett 18(1):71–74Marti A, Cobos M, Lopez JJ (2013) A steered response power iterative method for high-accuracy acoustic source location. J Acoust Soc Am 134(4):2627–2630Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231 (special issue on “Program generation, optimization, and platform adaptation”)NVIDIA cuFFT library user’s guide (2018). https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf . Accessed 21 May 2018OpenCL fast Fourier transforms. http://clmathlibraries.github.io/clFFT . Accessed 21 May 2018Scarpino M (2012) OpenCL in action: how to accelerate graphics and computation. Mannin

    Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance

    Get PDF
    The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat Valenciana under Grant APOSTD/2016/069, in part by the Spanish Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under Project UJI-B2016-20.Publicad

    Performance analysis of a millimeter wave MIMO channel estimation method in an embedded multi-core processor

    Get PDF
    The emerging Multi-Processor System-on-Chip (MPSoC) technology, which combines heterogeneous computing with the high performance of field programmable gate arrays (FPGA), is a promising platform for a large number of applications, including wireless communications and vehicular technology. In this specific application context, when multiple-input multiple-output (MIMO) scenarios are considered, the system usually has to manage a large number of communication links among sensors and antennas involving different vehicles and users. Millimeter wave (mmWave) communications are one of the key technology enablers toward achieving high data rates in beyond 5G systems (B5G). Communication at these frequency bands usually involves the use of large antenna arrays, often requiring high computational resources. One of the candidate platforms able to manage a huge number of communications is the Xilinx Zynq UltraScale+ EG Heterogeneous MPSoC, which is composed of a dual-core Cortex-R5, a quad-core ARM Cortex-A53, a graphics processing unit (GPU) and a high-end FPGA. This work analyzes the computational performance that requires a recent mmWave MIMO channel estimation algorithm in a platform of this kind. As a first approach, we will focus our work on the performance that can be achieved via the quad-core ARM Cortex-A53. To this end, we will use the libraries for numerical algebra (BLAS and LAPACK). The results show that our reference implementation is able to manage a large MIMO communication system with 256 antennas without exhausting platform resources.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Thanks to Grant PID2020-113785RB-100 funded by MCIN/AEI/1013039/ 501100011033 and the RamĂłn y Cajal Grant RYC-2017-22101. The work has been also supported by the Spanish Ministry of Science and Innovation under Grants RTI2018-097045-B-C21, PID2019-106455GB-C21 and PID2020-113656RB-C21, as well as the Regional Government of Madrid throughout the projects MIMACUHSPACE-CM-UC3M (2022/00024/001) and PEJD-2019-PRE/TIC-16327

    PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS

    Full text link
    Multichannel acoustic signal processing has undergone major development in recent years due to the increased complexity of current audio processing applications. People want to collaborate through communication with the feeling of being together and sharing the same environment, what is considered as Immersive Audio Schemes. In this phenomenon, several acoustic e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation, sound source localization, among others. However, high computing capacity is required to achieve any of these e ects in a real large-scale system, what represents a considerable limitation for real-time applications. The increase of the computational capacity has been historically linked to the number of transistors in a chip. However, nowadays the improvements in the computational capacity are mainly given by increasing the number of processing units, i.e expanding parallelism in computing. This is the case of the Graphics Processing Units (GPUs), that own now thousands of computing cores. GPUs were traditionally related to graphic or image applications, but new releases in the GPU programming environments, CUDA or OpenCL, allowed that most applications were computationally accelerated in elds beyond graphics. This thesis aims to demonstrate that GPUs are totally valid tools to carry out audio applications that require high computational resources. To this end, di erent applications in the eld of audio processing are studied and performed using GPUs. This manuscript also analyzes and solves possible limitations in each GPU-based implementation both from the acoustic point of view as from the computational point of view. In this document, we have addressed the following problems: Most of audio applications are based on massive ltering. Thus, the rst implementation to undertake is a fundamental operation in the audio processing: the convolution. It has been rst developed as a computational kernel and afterwards used for an application that combines multiples convolutions concurrently: generalized crosstalk cancellation and equalization. The proposed implementation can successfully manage two di erent and common situations: size of bu ers that are much larger than the size of the lters and size of bu ers that are much smaller than the size of the lters. Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application deals with binaural audio. Its main feature is that this application is able to synthesize sound sources in spatial positions that are not included in the database of HRTF and to generate smoothly movements of sound sources. Both features were designed after di erent tests (objective and subjective). The performance regarding number of sound source that could be rendered in real time was assessed on GPUs with di erent GPU architectures. A similar performance is measured in a Wave Field Synthesis system (second spatial audio application) that is composed of 96 loudspeakers. The proposed GPU-based implementation is able to reduce the room e ects during the sound source rendering. A well-known approach for sound source localization in noisy and reverberant environments is also addressed on a multi-GPU system. This is the case of the Steered Response Power with Phase Transform (SRPPHAT) algorithm. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. The solutions implemented in this thesis are evaluated both from localization and from computational performance points of view, taking into account different acoustic environments, and always from a real-time implementation perspective. Finally, This manuscript addresses also massive multichannel ltering when the lters present an In nite Impulse Response (IIR). Two cases are analyzed in this manuscript: 1) IIR lters composed of multiple secondorder sections, and 2) IIR lters that presents an allpass response. Both cases are used to develop and accelerate two di erent applications: 1) to execute multiple Equalizations in a WFS system, and 2) to reduce the dynamic range in an audio signal.Belloch Rodríguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale

    FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities

    Get PDF
    Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed

    Acceleration Techniques for Sparse Recovery Based Plane-wave Decomposition of a Sound Field

    Get PDF
    Plane-wave decomposition by sparse recovery is a reliable and accurate technique for plane-wave decomposition which can be used for source localization, beamforming, etc. In this work, we introduce techniques to accelerate the plane-wave decomposition by sparse recovery. The method consists of two main algorithms which are spherical Fourier transformation (SFT) and sparse recovery. Comparing the two algorithms, the sparse recovery is the most computationally intensive. We implement the SFT on an FPGA and the sparse recovery on a multithreaded computing platform. Then the multithreaded computing platform could be fully utilized for the sparse recovery. On the other hand, implementing the SFT on an FPGA helps to flexibly integrate the microphones and improve the portability of the microphone array. For implementing the SFT on an FPGA, we develop a scalable FPGA design model that enables the quick design of the SFT architecture on FPGAs. The model considers the number of microphones, the number of SFT channels and the cost of the FPGA and provides the design of a resource optimized and cost-effective FPGA architecture as the output. Then we investigate the performance of the sparse recovery algorithm executed on various multithreaded computing platforms (i.e., chip-multiprocessor, multiprocessor, GPU, manycore). Finally, we investigate the influence of modifying the dictionary size on the computational performance and the accuracy of the sparse recovery algorithms. We introduce novel sparse-recovery techniques which use non-uniform dictionaries to improve the performance of the sparse recovery on a parallel architecture
    corecore