178 research outputs found

    Efficient target-response interpolation for a graphic equalizer

    Get PDF
    Proceedings of the 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, held in Shanghai (China) during 20-25 March 2016.A graphic equalizer is an adjustable filter in which the command gain of each frequency band is practically independent of the gains of other bands. Designing a graphic equalizer with a high precision requires evaluating a target response that interpolates the magnitude response at several frequency points between the command gains. Good accuracy has been previously achieved by using polynomial interpolation methods such as cubic Hermite or spline interpolation. However, these methods require large computational resources, which is a limitation in real-time applications. This paper proposes an efficient way of computing the target response without sacrificing the approximation accuracy. This new approach called Linear Interpolation with Constant Segments (LICS) reduces the computing time of the target response by 55% and has an intrinsic parallel structure. Performance of the LICS method is assessed on an ARM Cortex-A7 core, which is commonly used in embedded systems.This work was conducted in spring 2015 when the first author was a visiting postdoctoral researcher at Aalto University. This research has been partly funded by the TIN2014-53495-R and TIN2011-23283 projects of the Ministerio de Economía y Competitividad and FEDER

    An Efficient Implementation of Parallel Parametric HRTF Models for Binaural Sound Synthesis in Mobile Multimedia

    Get PDF
    The extended use of mobile multimedia devices in applications like gaming, 3D video and audio reproduction, immersive teleconferencing, or virtual and augmented reality, is demanding efficient algorithms and methodologies. All these applications require real-time spatial audio engines with the capability of dealing with intensive signal processing operations while facing a number of constraints related to computational cost, latency and energy consumption. Most mobile multimedia devices include a Graphics Processing Unit (GPU) that is primarily used to accelerate video processing tasks, providing high computational capabilities due to its inherent parallel architecture. This paper describes a scalable parallel implementation of a real-time binaural audio engine for GPU-equipped mobile devices. The engine is based on a set of head-related transfer functions (HRTFs) modelled with a parametric parallel structure, allowing efficient synthesis and interpolation while reducing the size required for HRTF data storage. Several strategies to optimize the GPU implementation are evaluated over a well-known kind of processor present in a wide range of mobile devices. In this context, we analyze both the energy consumption and real-time capabilities of the system by exploring different GPU and CPU configuration alternatives. Moreover, the implementation has been conducted using the OpenCL framework, guarantying the portability of the code

    Real-time massive convolution for audio applications on GPU

    Full text link
    [EN] Massive convolution is the basic operation in multichannel acoustic signal processing. This field has experienced a major development in recent years. One reason for this has been the increase in the number of sound sources used in playback applications available to users. Another reason is the growing need to incorporate new effects and to improve the hearing experience. Massive convolution requires high computing capacity. GPUs offer the possibility of parallelizing these operations. This allows us to obtain the processing result in much shorter time and to free up CPU resources. One important aspect lies in the possibility of overlapping the transfer of data from CPU to GPU and vice versa with the computation, in order to carry out real-time applications. Thus, a synthesis of 3D sound scenes could be achieved with only a peer-to-peer music streaming environment using a simple GPU in your computer, while the CPU in the computer is being used for other tasks. Nowadays, these effects are obtained in theaters or funfairs at a very high cost, requiring a large quantity of resources. Thus, our work focuses on two mains points: to describe an efficient massive convolution implementation and to incorporate this task to real-time multichannel-sound applications. © 2011 Springer Science+Business Media, LLC.This work was partially supported by the Spanish Ministerio de Ciencia e Innovacion (Projects TIN2008-06570-C04-02 and TEC2009-13741), Universidad Politecnica de Valencia through PAID-05-09 and Generalitat Valenciana through project PROMETEO/2009/2013Belloch Rodríguez, JA.; Gonzalez, A.; Martínez Zaldívar, FJ.; Vidal Maciá, AM. (2011). Real-time massive convolution for audio applications on GPU. Journal of Supercomputing. 58(3):449-457. https://doi.org/10.1007/s11227-011-0610-8S449457583Spors S, Rabenstein R, Herbordt W (2007) Active listening room compensation for massive multichannel sound reproduction system using wave-domain adaptive filtering. J Acoust Soc Am 122:354–369Huang Y, Benesty J, Chen J (2008) Generalized crosstalk cancellation and equalization using multiple loudspeakers for 3D sound reproduction at the ears of multiple listeners. In: IEEE int conference on acoustics, speech and signal processing, Las Vegas, USA, pp 405–408Cowan B, Kapralos B (2008) Spatial sound for video games and virtual environments utilizing real-time GPU-based convolution. In: Proceedings of the ACM FuturePlay 2008 international conference on the future of game design and technology, Toronto, Ontario, Canada, November 3–5Belloch JA, Vidal AM, Martinez-Zaldivar FJ, Gonzalez A (2010) Multichannel acoustic signal processing on GPU. In: Proceedings of the 10th international conference on computational and mathematical methods in science and engineering, vol 1. Almeria, Spain, June 26–30, pp 181–187Cowan B, Kapralos B (2009) GPU-based one-dimensional convolution for real-time spatial sound generation. Sch J 3(5)Soliman SS, Mandyam DS, Srinath MD (1997) Continuous and discrete signals and systems. Prentice Hall, New YorkOppenheim AV, Willsky AS, Hamid Nawab S (1996) Signals and systems. Prentice Hall, New YorkopenGL: http://www.opengl.org/MKL library: http://software.intel.com/en-us/intel-mkl/MKL library: http://software.intel.com/en-us/intel-ipp/CUFFT library: http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/CUFFT_Library_3.1.pdfCUDA Toolkit 3.1: http://developer.nvidia.com/object/cuda_3_1_downloads.htmlCUDA Toolkit 3.2: http://developer.nvidia.com/object/cuda_3_1_downloads.htmlDatasheet of AC’97 SoundMAX Codec: http://www.xilinx.com/products/boards/ml505/datasheets/87560554AD1981B_c.pd

    Evaluating the soft error sensitivity of a GPU-based SoC for matrixmultiplication

    Get PDF
    System-on-Chip (SoC) devices can be composed of low-power multicore processors combined with a small graphics accelerator (or GPU) which offers a trade-off between computational capacity and low-power consumption. In this work we use the LLFI-GPU fault injection tool on one of these devices to compare the sensitivity to soft errors of two different CUDA versions of matrix multiplication benchmark. Specifically, we perform fault injection campaigns on a Jetson TK1 development kit, a board equipped with a SoC including an NVIDIA ”Kepler“ Graphics Processing Unit (GPU). We evaluate the effect of modifying the size of the problem and also the thread-block size on the behaviour of the algorithms. Our results show that the block version of the matrix multiplication benchmark that leverages the shared memory of the GPU is not only faster than the element-wise version, but it is also much more resilient to soft errors. We also use the cuda-gdb debugger to analyze the main causes of the crashes in the code due to soft errors. Our experiments show that most of the errors are due to accesses to invalid positions of the different memories of the GPU, which causes that the block version suffers a higher percentage of this kind of errors

    Evaluating the computational performance of the Xilinx Ultrascale plus EG Heterogeneous MPSoC

    Get PDF
    The emergent technology of Multi-Processor System-on-Chip (MPSoC), which combines heterogeneous computing with the high performance of Field Programmable Gate Arrays (FPGAs) is a very interesting platform for a huge number of applications ranging from medical imaging and augmented reality to high-performance computing in space. In this paper, we focus on the Xilinx Zynq UltraScale+ EG Heterogeneous MPSoC, which is composed of four different processing elements (PE): a dual-core Cortex-R5, a quad-core ARM Cortex-A53, a graphics processing unit (GPU) and a high end FPGA. Proper use of the heterogeneity and the different levels of parallelism of this platform becomes a challenging task. This paper evaluates this platform and each of its PEs to carry out fundamental operations in terms of computational performance. To this end, we evaluate image-based applications and a matrix multiplication kernel. On former, the image-based applications leverage the heterogeneity of the MPSoc and strategically distributes its tasks among both kinds of CPU cores and the FPGA. On the latter, we analyze separately each PE using different matrix multiplication benchmarks in order to assess and compare their performance in terms of MFlops. This kind of operations are being carried out for example in a large number of space-related applications where the MPSoCs are currently gaining momentum. Results stand out the fact that different PEs can collaborate efficiently with the aim of accelerating the computational-demanding tasks of an application. Another important aspect to highlight is that leveraging the parallel OpenBLAS library we achieve up to 12 GFlops with the four Cortex-A53 cores of the platform, which is a considerable performance for this kind of devices

    Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures

    Get PDF
    Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task depending on multiple factors. These include, for example, the number and granularity of the computations or the use of the memories of the devices. In this paper, we assess those factors by evaluating and comparing different parallelizations of the same problem on a multiprocessor containing a CPU with 40 cores and four P100 GPUs with Pascal architecture. We use, as study case, the convolutional operation behind a non-standard finite element mesh truncation technique in the context of open region electromagnetic wave propagation problems. A total of six parallel algorithms implemented using OpenMP and CUDA have been used to carry out the comparison by leveraging the same levels of parallelism on both types of platforms. Three of the algorithms are presented for the first time in this paper, including a multi-GPU method, and two others are improved versions of algorithms previously developed by some of the authors. This paper presents a thorough experimental evaluation of the parallel algorithms on a radar cross-sectional prediction problem. Results show that performance obtained on the GPU clearly overcomes those obtained in the CPU, much more so if we use multiple GPUs to distribute both data and computations. Accelerations close to 30 have been obtained on the CPU, while with the multi-GPU version accelerations larger than 250 have been achieved.Funding for open access charge: CRUE-Universitat Jaume

    Exposición mediante realidad virtual para el TOC: ¿Es factible?

    Get PDF
    Virtual reality exposure therapy (VRET) is receiving increased attention, especially in the fields of anxiety and eating disorders. This study is the first trial examining the utility of VRET from the perspective of OCD patients. Four OCD women assessed the sense of presence, emotional engagement, and reality judgment, and the anxiety and disgust levels they experimented in four scenarios, called the Contaminated Virtual Environment (COVE), in which they had to perform several activities. The COVE scenarios were presented on a Full HD 46” TV connected to a laptop and to a Kinect device. Results indicate that the COVE scenarios generated a good sense of presence. The anxiety and disgust levels increased as the virtual contamination increased, and the anxiety produced was related to the emotional engagement and sense of presence.La Exposición mediante Realidad Virtual (ERV) está recibiendo una atención cada vez mayor, especialmente para los trastornos de ansiedad y los alimentarios. Este estudio es el primer ensayo que evalúa la utilidad de la ERV desde la propia perspectiva de pacientes con Trastorno Obsesivo-Compulsivo (TOC). Cuatro mujeres con TOC evaluaron la sensación de presencia, implicación emocional, el juicio de realidad, y los niveles de ansiedad y asco que experimentaban en cuatro escenarios virtuales, que denominamos Entorno Virtual Contaminado (EVCO), en los que debían realizar varias actividades. Los escenarios se presentaron en una TV Full HD de 46’’, conectada a un ordenador y a un dispositivo Kinect. Los resultados indican que EVCO produjo una buena sensación de presencia. Los niveles de ansiedad y asco aumentaron a medida que aumentaba la “contaminación” de los escenarios, y la ansiedad se asoció con la sensación de presencia y la implicación emocional

    Audiovisual Tool for understanding Audio concepts for being used in bachelor’s degree programmes

    Full text link
    [EN] In the Audio Signal Processing field, there exists difficulties in order to explain different concepts such as, compression, masking, quantization, sampling, among others. Further, most of these concepts require the use of audio laboratories and multiple practical session that must carry out students. Another issue is that there are students that are not able to internalize these concepts straightforwardly and require more practical sessions. In order to address these problems, we have developed an audiovisual tool, designed with Matlab, that can be used for professors and students. This tool allows to analyze, test and apply the audio concepts to real audio signals. The developed tool has been successfully experienced by professors of the audio signal processing field that recommend its use in upcoming academic courses.This research has been partly funded by TIN2014-53495-R, BES-2013-063783, BES-2013- 065034, TEC2013-47141-C4-4-R and FPU AP-2012/71274.Antoñanzas Manuel, C.; Gutiérrez Parera, P.; Simarro Haro, MDLA.; Belloch, JA. (2016). Audiovisual Tool for understanding Audio concepts for being used in bachelor’s degree programmes. En 2nd. International conference on higher education advances (HEAD'16). Editorial Universitat Politècnica de València. 495-502. https://doi.org/10.4995/HEAD16.2016.2923OCS49550

    Accelerating multi-channel filtering of audio signal on ARM processors

    Get PDF
    The researchers from Universitat Jaume I are supported by the CICYT projects TIN2014-53495-R and TIN2011-23283 of the Ministerio de Economía y Competitividad and FEDER. The authors from the Universitat Politècnica de València are supported by projects TEC2015-67387-C4-1-R and PROMETEOII/2014/003. This work was also supported from the European Union FEDER (CAPAP-H5 network TIN2014-53522-REDT)

    A pipeline structure for the block QR update in digital signal processing

    Get PDF
    [EN] There exist problems in the field of digital signal processing, such as filtering of acoustic signals that require processing a large amount of data in real time. The beamforming algorithm, for instance, is a process that can be modeled by a rectangular matrix built on the input signals of an acoustic system and, thus, changes in real time. To obtain the output signals, it is required to compute its QR factorization. In this paper, we propose to organize the concurrent computational resources of a given multicore computer in a pipeline structure to perform this factorization as fast as possible. The pipeline has been implemented using both the application programming interface OpenMP and GrPPI, a library interface to design parallel applications based on parallel patterns. We tackle not only the performance challenge but also the programmability of our idea using parallel programming frameworks.This work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TIN2014-53495-R and TEC2015-67387-C4-1-R.Dolz, MF.; Alventosa, FJ.; Alonso-Jordá, P.; Vidal Maciá, AM. (2019). A pipeline structure for the block QR update in digital signal processing. The Journal of Supercomputing. 75(3):1470-1482. https://doi.org/10.1007/s11227-018-2666-1S14701482753Huang Y, Benesty J, Chen J (2006) Acoustic MIMO signal processing (signals and communication technology). Springer, BerlinRamiro C, Vidal AM, González A (2015) MIMOPack: a high performance computing library for MIMO communication systems. J Supercomput 71:751–760Alventosa FJ, Alonso P, Piñero G, Vidal AM (2016) Implementation of the Beamformer algorithm for the NVIDIA Jetson. In: Actas de la Conferencia, Granada, Spain, pp 201–211. ISBN 978-3-319-49955-0Alventosa FJ, Alonso P, Vidal AM, Piñero G, Quintana-Ortí ES (2018) Fast block QR update in digital signal processing. J Supercomput. https://doi.org/10.1007/s11227-018-2298-5del Rio D, Dolz MF, Fernández J, García JD (2017) A generic parallel pattern interface for stream and data processing. Concurr Comput Pract Exp 29(24):e4175Benesty J, Chen J, Huang Y, Dmochowski J (2007) On microphone-array Beamforming from a MIMO acoustic signal processing perspective. IEEE Trans Audio Speech Lang Process 15(3):1053–1065Lorente J, Piñero G, Vidal AM, Belloch JA, González A (2011) Parallel implementations of Beamforming design and filtering for microphone array applications. In: 19th European Signal Processing Conference (EUSIPCO), Barcelona, Spain, pp 501–505Belloch JA, Ferrer M, González A, Martínez-Zaldívar FJ, Vidal AM (2013) Headphone-based virtual spatialization of sound with a GPU accelerator. J Audio Eng Soc 61:546–561Belloch JA, González A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Golub GH, Van Loan CF (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, BaltimoreGunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53Dolz MF, Alventosa FJ, Alonso-Jordá P, Vidal AM (2018) A pipeline for the QR update in digital signal processing. In: Proceedings of the 18th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2018), Rota, Cádiz, Spain, pp 1–5Quintana-Ortí G, Quintana-Ortí ES, Van De Geijn RA, Van Zee FG, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:2
    corecore