7 research outputs found

    Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL

    Get PDF
    [EN] The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a large number of acoustic applications such as automatic camera steering systems, human-machine interaction, video gaming and audio surveillance. SPR-PHAT implementations require to handle a high number of signals coming from a microphone array and a huge search grid that influences the localization accuracy of the system. In this context, high performance in the localization process can only be achieved by using massively parallel computational resources. Different types of multi-core machines based either on multiple CPUs or on GPUs are commonly employed in diverse fields of science for accelerating a number of applications, mainly using OpenMP and CUDA as programming frameworks, respectively. This implies the development of multiple source codes which limits the portability and application possibilities. On the contrary, OpenCL has emerged as an open standard for parallel programming that is nowadays supported by a wide range of architectures. In this work, we evaluate an OpenCL-based implementations of the SRP-PHAT algorithm in two state-of-the-art CPU and GPU platforms. Results demonstrate that OpenCL achieves close-to-CUDA performance in GPU (considered as upper bound) and outperforms in most of the CPU configurations based on OpenMP.This work has been supported by the postdoctoral fellowship from Generalitat Valenciana APOSTD/2016/069, the Spanish Government through TIN2014-53495-R, TIN2015-65277-R and BIA2016-76957-C3-1-R, and the Universidad Jaume I Project UJI-B2016-20.Badía Contelles, JM.; Belloch Rodríguez, JA.; Cobos Serrano, M.; Igual Peña, FD.; Quintana-Ortí, ES. (2019). Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. The Journal of Supercomputing. 75(3):1284-1297. https://doi.org/10.1007/s11227-018-2422-6S12841297753Brandstein M, Ward D (eds) (2001) Microphone arrays. Springer, BerlinKnapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. Trans Acoust Speech Signal Process 24:320–327Cobos M, Antonacci F, Alexandridis A, Mouchtaris A, Lee B (2017) A survey of sound source localization methods in wireless acoustic sensor networks. Wirel Commun Mobile Comput 2017, article ID 3956282DiBiase JH (2000) A high accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. Ph.D. dissertation, Brown University, ProvidenceLee CH (2017) Location-aware speakers for the virtual reality environments. IEEE Access 5:2636–2640Altera Corporation (2013) Implementing FPGA design with the OpenCL standard. https://www.altera.com/en_US/pdfs/literature/wp/wp-01173-opencl.pdf . Accessed 21 May 2018Savioja L, Välimäki V, Smith JO (2011) Audio signal processing using graphics processing units. J Audio Eng Soc 59(1–2):3–19Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Belloch JA, Gonzalez A, Quintana-Ortí ES, Ferrer M, Välimäki V (2017) GPU-based dynamic wave field synthesis using fractional delay filters and room compensation. IEEE/ACM Trans Audio Speech Lang Process 25(2):435–447Peruffo Minotto V, Rosito Jung C, Gonzaga da Silveira L, Lee B (2013) GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm. Int J High Perform Comput Appl 27(3):291–306Belloch JA, Gonzalez A, Vidal AM, Cobos M (2015) On the performance of multi-gpu-based expert systems for acoustic localization involving massive microphone arrays. Expert Syst Appl 42(13):5607–5620Seewald LC, Gonzaga L, Veronez MR, Minotto VP, Jung CR (2014) Combining srp-phat and two kinects for 3d sound source localization. Expert Syst Appl 41(16):0957–4174Theodoropoulos D, Kuzmanov G, Gaydadjiev G (2011) Multi-core platforms for beamforming and wave field synthesis. IEEE Trans Multimedia 3(2):235–245Belloch JA, Badia MJ, Igual FD, Quintana-Ortí E, Cobos M (2017) Evaluating sound source localization on multi and many-core platform. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, vol 1. Rota, pp 279–286Cobos M, Marti A, Lopez JJ (2011) A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Process Lett 18(1):71–74Marti A, Cobos M, Lopez JJ (2013) A steered response power iterative method for high-accuracy acoustic source location. J Acoust Soc Am 134(4):2627–2630Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231 (special issue on “Program generation, optimization, and platform adaptation”)NVIDIA cuFFT library user’s guide (2018). https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf . Accessed 21 May 2018OpenCL fast Fourier transforms. http://clmathlibraries.github.io/clFFT . Accessed 21 May 2018Scarpino M (2012) OpenCL in action: how to accelerate graphics and computation. Mannin

    On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone array

    Get PDF
    Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human-machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. Graphics Processing Units (GPUs) are highly parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. Emerging GPUs offer multiple parallelism levels; however, properly managing their computational resources becomes a very challenging task. In fact, management issues become even more difficult when multiple GPUs are involved, adding one more level of parallelism. In this paper, the performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system. The paper evaluates and points out the influence that the number of microphones and the available computational resources have in the overall system performance. Several acoustic environments are considered to show the impact that noise and reverberation have in the localization accuracy and how the use of massive microphone systems combined with parallelized GPU algorithms can help to mitigate substantially adverse acoustic effects. In this context, the proposed implementation is able to work in real time with high-resolution spatial grids and using up to 48 microphones. These results confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.This work has been partially funded by the Spanish Ministerio de Economia y Competitividad (TEC2009-13741, TEC2012-38142-C04-01, and TEC2012-37945-C02-02), Generalitat Valenciana PROMETEO 2009/2013, and Universitat Politecnica de Valencia through Programa de Apoyo a la Investigacion y Desarrollo (PAID-05-11 and PAID-05-12).Belloch Rodríguez, JA.; Gonzalez, A.; Vidal Maciá, AM.; Cobos Serrano, M. (2015). On the performance of multi-GPU-based expert systems for acoustic localization involving massive microphone array. Expert Systems with Applications. 42(13):5607-5620. https://doi.org/10.1016/j.eswa.2015.02.056S56075620421

    Towards real-time 3D sound sources mapping with linear microphone arrays

    Full text link
    © 2017 IEEE. In this paper, we present a method for real-time 3D sound sources mapping using an off-the-shelf robotic perception sensor equipped with a linear microphone array. Conventional approaches to map sound sources in 3D scenarios use dedicated 3D microphone arrays, as this type of arrays provide two degrees of freedom (DOF) observations. Our method addresses the problem of 3D sound sources mapping using a linear microphone array, which only provides one DOF observations making the estimation of the sound sources location more challenging. In the proposed method, multi hypotheses tracking is combined with a new sound source parametrisation to provide with a good initial guess for an online optimisation strategy. A joint optimisation is carried out to estimate 6 DOF sensor poses and 3 DOF landmarks together with the sound sources locations. Additionally, a dedicated sensor model is proposed to accurately model the noise of the Direction of Arrival (DOA) observation when using a linear microphone array. Comprehensive simulation and experimental results show the effectiveness of the proposed method. In addition, a real-time implementation of our method has been made available as open source software for the benefit of the community

    Detection of activity and position of speakers by using deep neural networks and acoustic data augmentation

    Get PDF
    The task of Speaker LOCalization (SLOC) has been the focus of numerous works in the research field, where SLOC is performed on pure speech data, requiring the presence of an Oracle Voice Activity Detection (VAD) algorithm. Nevertheless, this perfect working condition is not satisfied in a real world scenario, where employed VADs do commit errors. This work addresses this issue with an extensive analysis focusing on the relationship between several data-driven VAD and SLOC models, finally proposing a reliable framework for VAD and SLOC. The effectiveness of the approach here discussed is assessed against a multi-room scenario, which is close to a real-world environment. Furthermore, up to the authors’ best knowledge, only one contribution proposes a unique framework for VAD and SLOC acting in this addressed scenario; however, this solution does not rely on data-driven approaches. This work comes as an extension of the authors’ previous research addressing the VAD and SLOC tasks, by proposing numerous advancements to the original neural network architectures. In details, four different models based on convolutional neural networks (CNNs) are here tested, in order to easily highlight the advantages of the introduced novelties. In addition, two different CNN models go under study for SLOC. Furthermore, training of data-driven models is here improved through a specific data augmentation technique. During this procedure, the room impulse responses (RIRs) of two virtual rooms are generated from the knowledge of the room size, reverberation time and microphones and sources placement. Finally, the only other framework for simultaneous detection and localization in a multi-room scenario is here taken into account to fairly compare the proposed method. As result, the proposed method is more accurate than the baseline framework, and remarkable improvements are specially observed when the data augmentation techniques are applied for both the VAD and SLOC tasks

    高速ビジョンを用いた振動源定位に関する研究

    Get PDF
    広島大学(Hiroshima University)博士(工学)Doctor of Engineeringdoctora

    Local user mapping via multi-modal fusion for social robots

    Get PDF
    User detection, recognition and tracking is at the heart of Human Robot Interaction, and yet, to date, no universal robust method exists for being aware of the people in a robot surroundings. The presented work aims at importing into existing social robotics platforms different techniques, some of them classical, and other novel, for detecting, recognizing and tracking human users. These algorithms are based on a variety of sensors, mainly cameras and depth imaging devices, but also lasers and microphones. The results of these parallel algorithms are then merged so as to obtain a modular, expandable and fast architecture. This results in a local user mapping thanks to multi-modal fusion. Thanks to this user awareness architecture, user detection, recognition and tracking capabilities can be easily and quickly given to any robot by re-using the modules that match its sensors and its processing performance. The architecture provides all the relevant information about the users around the robot, that can then be used for end-user applications that adapt their behavior to the users around the robot. The variety of social robots in which the architecture has been successfully implemented includes a car-like mobile robot, an articulated flower and a humanoid assistance robot. Some modules of the architecture are very lightweight but have a low reliability, others need more CPU but the associated confidence is higher. All configurations of modules are possible, and fit the range of possible robotics hardware configurations. All the modules are independent and highly configurable, therefore no code needs to be developed for building a new configuration, the user only writes a ROS launch file. This simple text file contains all wanted modules. The architecture has been developed with modularity and speed in mind. It is based on the Robot Operating System (ROS) architecture, a de facto software standard in robotics. The different people detectors comply with a common interface called PeoplePoseList Publisher, while the people recognition algorithms comply with an interface called PeoplePoseList Matcher. The fusion of all these different modules is based on Unscented Kalman Filter techniques. Extensive benchmarks of the sub-components and of the whole architecture, using both academic datasets and data acquired in our lab, and end-user application samples demonstrate the validity and interest of all levels of the architecture.La detección, el reconocimiento y el seguimiento de los usuarios es un problema clave para la Interacción Humano-Robot. Sin embargo, al día de hoy, no existe ningún método robusto universal para para lograr que un robot sea consciente de la gente que le rodea. Esta tesis tiene como objetivo implementar, dentro de robots sociales, varias técnicas, algunas clásicas, otras novedosas, para detectar, reconocer y seguir a los usuarios humanos. Estos algoritmos se basan en sensores muy variados, principalmente cámaras y fuentes de imágenes de profundidad, aunque también en láseres y micrófonos. Los resultados parciales, suministrados por estos algoritmos corriendo en paralelo, luego son mezcladas usando técnicas probabilísticas para obtener una arquitectura modular, extensible y rápida. Esto resulta en un mapa local de los usuarios, obtenido por técnicas de fusión de datos. Gracias a esta arquitectura, las habilidades de detección, reconocimiento y seguimiento de los usuarios podrían ser integradas fácil y rápidamente dentro de un nuevo robot, reusando los módulos que corresponden a sus sensores y el rendimiento de su procesador. La arquitectura suministra todos los datos útiles sobre los usuarios en el alrededor del robot y se puede usar por aplicaciones de más alto nivel en nuestros robots sociales de manera que el robot adapte su funcionamiento a las personas que le rodean. Los robots sociales en los cuales la arquitectura se pudo importar con éxito son: un robot en forma de coche, una flor articulada, y un robot humanoide asistencial. Algunos módulos de la arquitectura son muy ligeros pero con una fiabilidad baja, mientras otros requieren más CPU pero son más fiables. Todas las configuraciones de los módulos son posibles y se ajustan a las diferentes configuraciones hardware que puede tener el robot. Los módulos son independientes entre ellos y altamente configurables, por lo que no hay que desarrollar código para una nueva configuración. El usuario sólo tiene que escribir un fichero launch de ROS. Este sencillo fichero de texto contiene todos los módulos que se quieren lanzar. Esta arquitectura se desarrolló teniendo en mente que fuese modular y rápida. Se basa en la arquitectura Robot Operating System (ROS), un estándar software de facto en la robótica. Todos los detectores de personas tienen una interfaz común llamada PeoplePoseList Publisher, mientras los algoritmos de reconocimiento siguen una interfaz llamada PeoplePoseList Matcher. La fusión de todos estos módulos se basa en técnicas de filtros de Kalman no lineares (Unscented Kalman Filters). Se han realizado pruebas exhaustivas de precisión y de velocidad de cada componente y de la arquitectura completa (realizadas sobre ambos bases de datos académicas además de sobre datos grabados en nuestro laboratorio), así como prototipos sencillos de aplicaciones finales. Así se comprueba la validez y el interés de la arquitectura a todos los niveles.Programa Oficial de Doctorado en Ingeniería Eléctrica, Electrónica y AutomáticaPresidente: Fernando Torres Medina.- Secretario: María Dolores Blanco Rojas.- Vocal: Jorge Manuel Miranda Día
    corecore