72 research outputs found
Practical considerations for acoustic source localization in the IoT era: Platforms, energy efficiency, and performance
The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments, including security, industrial, and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent systems-on-chip composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects, such as the tradeoffs between performance, energy efficiency, and exploitation of parallelism by taking into account real-time constraintsThis work was supported in part by the Post-Doctoral Fellowship from Generalitat
Valenciana under Grant APOSTD/2016/069, in part by the Spanish
Government under Grant TIN2014-53495-R, Grant TIN2015-65277-R, and
Grant BIA2016-76957-C3-1-R, and in part by the Universidad Jaume I under
Project UJI-B2016-20.Publicad
Media gateway utilizando um GPU
Mestrado em Engenharia de Computadores e Telemátic
On binaural spatialization and the use of GPGPU for audio processing
3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques.
We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances. General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches.
The second aspect is related to the implementation of an augmented reality system having in mind an “off the shelf” system available to most home computers without the need of specialized hardware. A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene.
The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed. Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are propose
ON BINAURAL SPATIALIZATION AND THE USE OF GPGPU FOR AUDIO PROCESSING
3D recordings and audio, namely techniques that aim to create the perception of sound sources placed anywhere in 3 dimensional space, are becoming an interesting resource for composers, live performances and augmented reality. This thesis focuses on binaural spatialization techniques.
We will tackle the problem from three different perspectives. The first one is related to the implementation of an engine for audio convolution, this is a real implementation problem where we will confront with a number of already available systems trying to achieve better results in terms of performances.
General Purpose computing on Graphic Processing Units (GPGPU) is a promising approach to problems where a high parallelization of tasks is desirable. In this thesis the GPGPU approach is applied to both offline and real-time convolution having in mind the spatialization of multiple sound sources which is one of the critical problems in the field. Comparisons between this approach and typical CPU implementations are presented as well as between FFT and time domain approaches.
The second aspect is related to the implementation of an augmented reality system having in mind an ``off the shelf'' system available to most home computers without the need of specialized hardware.
A system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization is presented. Head tracking is performed through face tracking algorithms that use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose audio files to play, provide virtual positions for sources in an Euclidean space, and then listen as if they are coming from that position. If users move their head, the signals provided by the system change accordingly in real-time, thus providing the realistic effect of a coherent scene.
The last aspect covered by this work is within the field of psychoacoustic, long term research where we are interested in understanding how binaural audio and recordings are perceived and how then auralization systems can be efficiently designed.
Considerations with regard to the quality and the realism of such sounds in the context of ASA (Auditory Scene Analysis) are proposed
Recommended from our members
Design and Optimization of Mobile Cloud Computing Systems with Networked Virtual Platforms
A Mobile Cloud Computing (MCC) system is a cloud-based system that is accessed by the users through their own mobile devices. MCC systems are emerging as the product of two technology trends: 1) the migration of personal computing from desktop to mobile devices and 2) the growing integration of large-scale computing environments into cloud systems. Designers are developing a variety of new mobile cloud computing systems. Each of these systems is developed with different goals and under the influence of different design constraints, such as high network latency or limited energy supply.
The current MCC systems rely heavily on Computation Offloading, which however incurs new problems such as scalability of the cloud, privacy concerns due to storing personal information on the cloud, and high energy consumption on the cloud data centers. In this dissertation, I address these problems by exploring different options in the distribution of computation across different computing nodes in MCC systems. My thesis is that "the use of design and simulation tools optimized for design space exploration of the MCC systems is the key to optimize the distribution of computation in MCC."
For a quantitative analysis of mobile cloud computing systems through design space exploration, I have developed netShip, the first generation of an innovative design and simulation tool, that offers large scalability and heterogeneity support. With this tool system designers and software programmers can efficiently develop, optimize, and validate large-scale, heterogeneous MCC systems. I have enhanced netShip to support the development of ever-evolving MCC applications with a variety of emerging needs including the fast simulation of new devices, e.g., Internet-of-Things devices, and accelerators, e.g., mobile GPUs. Leveraging netShip, I developed three new MCC systems where I applied three variations of a new computation distributing technique, called Reverse Offloading. By more actively leveraging the computational power on mobile devices, the MCC systems can reduce the total execution times, the burden of concentrated computations on the cloud, and the privacy concerns about storing personal information available in the cloud. This approach also creates opportunities for new services by utilizing the information available on the mobile device instead of accessing the cloud.
Throughout my research I have enabled the design optimization of mobile applications and cloud-computing platforms. In particular, my design tool for MCC systems becomes a vehicle to optimize not only the performance but also the energy dissipation, an aspect of critical importance for any computing system
Evaluating the graphics processing unit for digital audio synthesis and the development of HyperModels
The extraordinary growth in computation in single processors for almost half a century is becoming increasingly difficult to maintain. Future computational growth is expected from parallel processors, as seen in the increasing number of tightly coupled processors inside the conventional modern heterogeneous system. The graphics processing unit (GPU) is a massively parallel processing unit that can be used to accelerate particular digital audio processes; However, digital audio developers are cautious of adopting the GPU into their designs to avoid any complications the GPU architecture may have. For example, linear systems simulated using finite-difference-based physical model synthesis is highly suited for the GPU, but developers will be reluctant to use it without a complete evaluation of the GPU for digital audio. Previously limited by computation, the audio landscape could see future advancement by providing a comprehensive evaluation of the GPU in digital audio and developing a framework for accelerating particular audio processes. This thesis is separated into two parts; Part one evaluates the utility of the GPU as a hardware accelerator for digital audio processing using bespoke performance benchmarking suites. The results suggest that the GPU is appropriate under particular conditions; For example, the sample buffer size dispatched to the GPU must be within 32 to 512 to meet real-time digital audio requirements. However, despite some constraints, the GPU could support linear finite-difference-based physical models with 4X higher resolution than the equivalent CPU version. These results suggest that the GPU is superior to the CPU for high-resolution physical models. Therefore, the second part of this thesis presents the design of the novel HyperModels framework to facilitate the development of real-time linear physical models for interaction and performance. HyperModels uses vector graphics to describe a model's geometry and a domain-specific language (DSL) to define the physics equations that operate in the physical model. An implementation of the HyperModels framework is then objectively evaluated by comparing the performance with manually written CPU and GPU equivalent versions. The automatically generated GPU programs from HyperModels were shown to outperform the CPU versions for resolutions 64x64 and above whilst maintaining similar performance to the manually written GPU versions. To conclude part 2, the expressibility and usability of HyperModels is demonstrated by presenting two instruments built using the framewor
Towards efficient exploitation of GPUs : a methodology for mapping index-digit algorithms
[Resumen]La computación de propósito general en GPUs supuso un gran paso, llevando la
computación de alto rendimiento a los equipos domésticos. Lenguajes de programación de alto nivel como OpenCL y CUDA redujeron en gran medida la complejidad
de programación. Sin embargo, para poder explotar totalmente el poder computacional
de las GPUs, se requieren algoritmos paralelos especializados. La complejidad
en la jerarquía de memoria y su arquitectura masivamente paralela hace que la
programación de GPUs sea una tarea compleja incluso para programadores experimentados.
Debido a la novedad, las librerías de propósito general son escasas y las
versiones paralelas de los algoritmos no siempre están disponibles.
En lugar de centrarnos en la paralelización de algoritmos concretos, en esta tesis
proponemos una metodología general aplicable a la mayoría de los problemas de tipo
divide y vencerás con una estructura de mariposa que puedan formularse a través de
la representación Indice-Dígito. En primer lugar, se analizan los diferentes factores que afectan al rendimiento de la arquitectura de las GPUs. A continuación, estudiamos
varias técnicas de optimización y diseñamos una serie de bloques constructivos
modulares y reutilizables, que se emplean para crear los diferentes algoritmos. Por último, estudiamos el equilibrio óptimo de los recursos, y usando vectores de mapeo
y operadores algebraicos ajustamos los algoritmos para las configuraciones deseadas.
A pesar del enfoque centrado en la exibilidad y la facilidad de programación, las
implementaciones resultantes ofrecen un rendimiento muy competitivo, que llega a superar conocidas librerías recientes.[Resumo] A computación de propósito xeral en GPUs supuxo un gran paso, levando a
computación de alto rendemento aos equipos domésticos. Linguaxes de programación de alto nivel como OpenCL e CUDA reduciron en boa medida a complexidade
da programación. Con todo, para poder aproveitar totalmente o poder computacional
das GPUs, requírense algoritmos paralelos especializados. A complexidade na
xerarquía de memoria e a súa arquitectura masivamente paralela fai que a programación de GPUs sexa unha tarefa complexa mesmo para programadores experimentados.
Debido á novidade, as librarías de propósito xeral son escasas e as versións
paralelas dos algoritmos non sempre están dispoñibles.
En lugar de centrarnos na paralelización de algoritmos concretos, nesta tese propoñemos unha metodoloxía xeral aplicable á maioría dos problemas de tipo divide e
vencerás cunha estrutura de bolboreta que poidan formularse a través da representación Índice-Díxito. En primeiro lugar, analízanse os diferentes factores que afectan
ao rendemento da arquitectura das GPUs. A continuación, estudamos varias técnicas
de optimización e deseñamos unha serie de bloques construtivos modulares e
reutilizables, que se empregan para crear os diferentes algoritmos. Por último, estudamos
o equilibrio óptimo dos recursos, e usando vectores de mapeo e operadores
alxbricos axustamos os algoritmos para as configuracións desexadas. A pesar do enfoque
centrado na exibilidade e a facilidade de programación, as implementacións
resultantes ofrecen un rendemento moi competitivo, que chega a superar coñecidas
librarías recentes.[Abstract]GPU computing supposed a major step forward, bringing high performance computing
to commodity hardware. Feature-rich parallel languages like CUDA and
OpenCL reduced the programming complexity. However, to fully take advantage of
their computing power, specialized parallel algorithms are required. Moreover, the
complex GPU memory hierarchy and highly threaded architecture makes programming
a difficult task even for experienced programmers. Due to the novelty of GPU
programming, common general purpose libraries are scarce and parallel versions of
the algorithms are not always readily available.
Instead of focusing in the parallelization of particular algorithms, in this thesis
we propose a general methodology applicable to most divide-and-conquer problems
with a buttery structure which can be formulated through the Index-Digit
representation. First, we analyze the different performance factors of the GPU architecture.
Next, we study several optimization techniques and design a series of
modular and reusable building blocks, which will be used to create the different
algorithms. Finally, we study the optimal resource balance, and through a mapping
vector representation and operator algebra, we tune the algorithms for the desired
configurations. Despite the focus on programmability and exibility, the resulting
implementations offer very competitive performance, being able to surpass other
well-known state of the art libraries
Digital Signal Processor Based Real-Time Phased Array Radar Backend System and Optimization Algorithms
This dissertation presents an implementation of multifunctional large-scale phased array radar based on the scalable DSP platform.
The challenge of building large-scale phased array radar backend is how to address the compute-intensive operations and high data throughput requirement in both front-end and backend in real-time. In most of the applications, FPGA or VLSI hardware are typically used to solve those difficulties. However, with the help of the fast development of IC industry, using a parallel set of high-performing programmable chips can be an alternative. We present a hybrid high-performance backend system by using DSP as the core computing device and MTCA as the system frame. Thus, the mapping techniques for the front and backend signal processing algorithm based on DSP are discussed in depth.
Beside high-efficiency computing device, the system architecture would be a major factor influencing the reliability and performance of the backend system. The reliability requires the system must incorporate the redundancy both in hardware and software. In this dissertation, we propose a parallel modular system based on MTCA chassis, which can be reliable, scalable, and fault-tolerant.
Finally, we present an example of high performance phased array radar backend, in which there is the number of 220 DSPs, achieving 7000 GFLOPS calculation from 768 channels. This example shows the potential of using the combination of DSP and MTCA as the computing platform for the future multi-functional large-scale phased array radar
- …