3,096 research outputs found
GPU-Based One-Dimensional Convolution for Real-Time Spatial Sound Generation
Incorporating spatialized (3D) sound cues in dynamic and interactive videogames and immersive virtual environment applications is beneficial for a number of reasons, ultimately leading to an increase in presence and immersion. Despite the benefits of spatial sound cues, they are often overlooked in videogames and virtual environments where typically, emphasis is placed on the visual cues. Fundamental to the generation of spatial sound is the one-dimensional convolution operation which is computationally expensive, not lending itself to such real-time, dynamic applications. Driven by the gaming industry and the great emphasis placed on the visual sense, consumer computer graphics hardware, and the graphics processing unit (GPU) in particular, has greatly advanced in recent years, even outperforming the computational capacity of CPUs. This has allowed for real-time, interactive realistic graphics-based applications on typical consumer- level PCs. Given the widespread use and availability of computer graphics hardware and the similarities that exist between the fields of spatial audio and image synthesis, here we describe the development of a GPU-based, one-dimensional convolution algorithm whose efficiency is superior to the conventional CPU-based convolution method. The primary purpose of the developed GPU-based convolution method is the computationally efficient generation of real- time spatial audio for dynamic and interactive videogames and virtual environments
Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs
[EN] Multichannel acoustic signal processing has undergone major development in recent years due to the increased com- plexity of current audio processing applications, which involves the processing of multiple sources, channels, or filters. A gen- eral scenario that appears in this context is the immersive reproduction of binaural audio without the use of headphones, which requires the use of a crosstalk canceler. However, generalized crosstalk cancellation and equalization (GCCE) requires high com- puting capacity, which is a considerable limitation for real-time applications. This paper discusses the design and implementation of all the processing blocks of a multichannel convolution on a GPU for real-time applications. To this end, a very efficient fil- tering method using specific data structures is proposed, which takes advantage of overlap-save filtering and filter fragmentation. It has been shown that, for a real-time application with 22 inputs and 64 outputs, the system is capable of managing 1408 filters of 2048 coefficients with a latency time less than 6 ms. The proposed GPU implementation can be easily adapted to any acoustic environment, demonstrating the validity of these co-processors for managing intensive multichannel audio applications.This work has been partially funded by Spanish Ministerio de Ciencia e Innovacion TEC2009-13741, Generalitat Valenciana PROMETEO 2009/2013 and GV/2010/027, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11).Belloch RodrÃguez, JA.; Gonzalez, A.; MartÃnez ZaldÃvar, FJ.; Vidal Maciá, AM. (2013). Multichannel massive audio processing for a generalized crosstalk cancellation and equalization application using GPUs. Integrated Computer-Aided Engineering. 20(2):169-182. https://doi.org/10.3233/ICA-130422S16918220
PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS
Multichannel acoustic signal processing has undergone major development
in recent years due to the increased complexity of current audio processing
applications. People want to collaborate through communication with the
feeling of being together and sharing the same environment, what is considered
as Immersive Audio Schemes. In this phenomenon, several acoustic
e ects are involved: 3D spatial sound, room compensation, crosstalk cancelation,
sound source localization, among others. However, high computing
capacity is required to achieve any of these e ects in a real large-scale system,
what represents a considerable limitation for real-time applications.
The increase of the computational capacity has been historically linked
to the number of transistors in a chip. However, nowadays the improvements
in the computational capacity are mainly given by increasing the
number of processing units, i.e expanding parallelism in computing. This
is the case of the Graphics Processing Units (GPUs), that own now thousands
of computing cores. GPUs were traditionally related to graphic or image
applications, but new releases in the GPU programming environments,
CUDA or OpenCL, allowed that most applications were computationally
accelerated in elds beyond graphics. This thesis aims to demonstrate
that GPUs are totally valid tools to carry out audio applications that require
high computational resources. To this end, di erent applications in
the eld of audio processing are studied and performed using GPUs. This
manuscript also analyzes and solves possible limitations in each GPU-based
implementation both from the acoustic point of view as from the computational
point of view. In this document, we have addressed the following
problems:
Most of audio applications are based on massive ltering. Thus, the
rst implementation to undertake is a fundamental operation in the audio
processing: the convolution. It has been rst developed as a computational
kernel and afterwards used for an application that combines multiples convolutions
concurrently: generalized crosstalk cancellation and equalization.
The proposed implementation can successfully manage two di erent and
common situations: size of bu ers that are much larger than the size of the
lters and size of bu ers that are much smaller than the size of the lters.
Two spatial audio applications that use the GPU as a co-processor have been developed from the massive multichannel ltering. First application
deals with binaural audio. Its main feature is that this application is able
to synthesize sound sources in spatial positions that are not included in the
database of HRTF and to generate smoothly movements of sound sources.
Both features were designed after di erent tests (objective and subjective).
The performance regarding number of sound source that could be rendered
in real time was assessed on GPUs with di erent GPU architectures. A
similar performance is measured in a Wave Field Synthesis system (second
spatial audio application) that is composed of 96 loudspeakers. The proposed
GPU-based implementation is able to reduce the room e ects during
the sound source rendering.
A well-known approach for sound source localization in noisy and reverberant
environments is also addressed on a multi-GPU system. This
is the case of the Steered Response Power with Phase Transform (SRPPHAT)
algorithm. Since localization accuracy can be improved by using
high-resolution spatial grids and a high number of microphones, accurate
acoustic localization systems require high computational power. The solutions
implemented in this thesis are evaluated both from localization and
from computational performance points of view, taking into account different
acoustic environments, and always from a real-time implementation
perspective.
Finally, This manuscript addresses also massive multichannel ltering
when the lters present an In nite Impulse Response (IIR). Two cases are
analyzed in this manuscript: 1) IIR lters composed of multiple secondorder
sections, and 2) IIR lters that presents an allpass response. Both
cases are used to develop and accelerate two di erent applications: 1) to
execute multiple Equalizations in a WFS system, and 2) to reduce the
dynamic range in an audio signal.Belloch RodrÃguez, JA. (2014). PERFORMANCE IMPROVEMENT OF MULTICHANNEL AUDIO BY GRAPHICS PROCESSING UNITS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/40651TESISPremios Extraordinarios de tesis doctorale
Exploiting partial reconfiguration through PCIe for a microphone array network emulator
The current Microelectromechanical Systems (MEMS) technology enables the deployment of relatively low-cost wireless sensor networks composed of MEMS microphone arrays for accurate sound source localization. However, the evaluation and the selection of the most accurate and power-efficient network’s topology are not trivial when considering dynamic MEMS microphone arrays. Although software simulators are usually considered, they consist of high-computational intensive tasks, which require hours to days to be completed. In this paper, we present an FPGA-based platform to emulate a network of microphone arrays. Our platform provides a controlled simulated acoustic environment, able to evaluate the impact of different network configurations such as the number of microphones per array, the network’s topology, or the used detection method. Data fusion techniques, combining the data collected by each node, are used in this platform. The platform is designed to exploit the FPGA’s partial reconfiguration feature to increase the flexibility of the network emulator as well as to increase performance thanks to the use of the PCI-express high-bandwidth interface. On the one hand, the network emulator presents a higher flexibility by partially reconfiguring the nodes’ architecture in runtime. On the other hand, a set of strategies and heuristics to properly use partial reconfiguration allows the acceleration of the emulation by exploiting the execution parallelism. Several experiments are presented to demonstrate some of the capabilities of our platform and the benefits of using partial reconfiguration
Microphone array for speaker localization and identification in shared autonomous vehicles
With the current technological transformation in the automotive industry, autonomous vehicles are getting closer to the Society of Automative Engineers (SAE) automation level 5. This level corresponds to the full vehicle automation, where the driving system autonomously monitors and navigates the environment. With SAE-level 5, the concept of a Shared Autonomous Vehicle (SAV) will soon become a reality and mainstream. The main purpose of an SAV is to allow unrelated passengers to share an autonomous vehicle without a driver/moderator inside the shared space. However, to ensure their safety and well-being until they reach their final destination, active monitoring of all passengers is required. In this context, this article presents a microphone-based sensor system that is able to localize sound events inside an SAV. The solution is composed of a Micro-Electro-Mechanical System (MEMS) microphone array with a circular geometry connected to an embedded processing platform that resorts to Field-Programmable Gate Array (FPGA) technology to successfully process in the hardware the sound localization algorithms.This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project nº 039334; Funding Reference: POCI-01-0247-FEDER-039334]
- …