11 research outputs found

    Optimized Fundamental Signal Processing Operations for Energy Minimization on Heterogeneous Mobile Devices

    Get PDF
    [EN] Numerous signal processing applications are emerging on both mobile and high-performance computing systems. These applications are subject to responsiveness constraints for user interactivity and, at the same time, must be optimized for energy efficiency. The increasingly heterogeneous power-versus-performance profile of modern hardware introduces new opportunities for energy savings as well as challenges. In this line, recent systems-on-chip (SoC) composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity while partially retaining the appealing low power consumption of embedded systems. This paper analyzes the potential of these new hardware systems to accelerate applications that involve a large number of floating-point arithmetic operations mainly in the form of convolutions. To assess the performance, a headphone-based spatial audio application for mobile devices based on a Samsung Exynos 5422 SoC has been developed. We discuss different implementations and analyze the tradeoffs between performance and energy efficiency for different scenarios and configurations. Our experimental results reveal that we can extend the battery lifetime of a device featuring such an architecture by a 238% by properly configuring and leveraging the computational resources.This work was supported by the Spanish Ministerio de Economia y Competitividad projects under Grant TIN2014-53495-R and Grant TEC2015-67387-C4-1-R, in part by the University Project UJI-B2016-20, in part by the Project PROMETEOII/2014/003. The work of J. A. Belloch was supported by the GVA Post-Doctoral Contract under Grant APOSTD/2016/069. This paper was recommended by Associate Editor Y. Ha.Belloch Rodríguez, JA.; Badia Contelles, JM.; Igual Peña, FD.; Gonzalez, A.; Quintana Ortí, ES. (2017). Optimized Fundamental Signal Processing Operations for Energy Minimization on Heterogeneous Mobile Devices. IEEE Transactions on Circuits and Systems I Regular Papers. 65(5):1614-1627. https://doi.org/10.1109/TCSI.2017.2761909S1614162765

    Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL

    Get PDF
    [EN] The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known method for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm is used in a large number of acoustic applications such as automatic camera steering systems, human-machine interaction, video gaming and audio surveillance. SPR-PHAT implementations require to handle a high number of signals coming from a microphone array and a huge search grid that influences the localization accuracy of the system. In this context, high performance in the localization process can only be achieved by using massively parallel computational resources. Different types of multi-core machines based either on multiple CPUs or on GPUs are commonly employed in diverse fields of science for accelerating a number of applications, mainly using OpenMP and CUDA as programming frameworks, respectively. This implies the development of multiple source codes which limits the portability and application possibilities. On the contrary, OpenCL has emerged as an open standard for parallel programming that is nowadays supported by a wide range of architectures. In this work, we evaluate an OpenCL-based implementations of the SRP-PHAT algorithm in two state-of-the-art CPU and GPU platforms. Results demonstrate that OpenCL achieves close-to-CUDA performance in GPU (considered as upper bound) and outperforms in most of the CPU configurations based on OpenMP.This work has been supported by the postdoctoral fellowship from Generalitat Valenciana APOSTD/2016/069, the Spanish Government through TIN2014-53495-R, TIN2015-65277-R and BIA2016-76957-C3-1-R, and the Universidad Jaume I Project UJI-B2016-20.Badía Contelles, JM.; Belloch Rodríguez, JA.; Cobos Serrano, M.; Igual Peña, FD.; Quintana-Ortí, ES. (2019). Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. The Journal of Supercomputing. 75(3):1284-1297. https://doi.org/10.1007/s11227-018-2422-6S12841297753Brandstein M, Ward D (eds) (2001) Microphone arrays. Springer, BerlinKnapp CH, Carter GC (1976) The generalized correlation method for estimation of time delay. Trans Acoust Speech Signal Process 24:320–327Cobos M, Antonacci F, Alexandridis A, Mouchtaris A, Lee B (2017) A survey of sound source localization methods in wireless acoustic sensor networks. Wirel Commun Mobile Comput 2017, article ID 3956282DiBiase JH (2000) A high accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. Ph.D. dissertation, Brown University, ProvidenceLee CH (2017) Location-aware speakers for the virtual reality environments. IEEE Access 5:2636–2640Altera Corporation (2013) Implementing FPGA design with the OpenCL standard. https://www.altera.com/en_US/pdfs/literature/wp/wp-01173-opencl.pdf . Accessed 21 May 2018Savioja L, Välimäki V, Smith JO (2011) Audio signal processing using graphics processing units. J Audio Eng Soc 59(1–2):3–19Belloch JA, Gonzalez A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Belloch JA, Gonzalez A, Quintana-Ortí ES, Ferrer M, Välimäki V (2017) GPU-based dynamic wave field synthesis using fractional delay filters and room compensation. IEEE/ACM Trans Audio Speech Lang Process 25(2):435–447Peruffo Minotto V, Rosito Jung C, Gonzaga da Silveira L, Lee B (2013) GPU-based approaches for real-time sound source localization using the SRP-PHAT algorithm. Int J High Perform Comput Appl 27(3):291–306Belloch JA, Gonzalez A, Vidal AM, Cobos M (2015) On the performance of multi-gpu-based expert systems for acoustic localization involving massive microphone arrays. Expert Syst Appl 42(13):5607–5620Seewald LC, Gonzaga L, Veronez MR, Minotto VP, Jung CR (2014) Combining srp-phat and two kinects for 3d sound source localization. Expert Syst Appl 41(16):0957–4174Theodoropoulos D, Kuzmanov G, Gaydadjiev G (2011) Multi-core platforms for beamforming and wave field synthesis. IEEE Trans Multimedia 3(2):235–245Belloch JA, Badia MJ, Igual FD, Quintana-Ortí E, Cobos M (2017) Evaluating sound source localization on multi and many-core platform. In: Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, vol 1. Rota, pp 279–286Cobos M, Marti A, Lopez JJ (2011) A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Process Lett 18(1):71–74Marti A, Cobos M, Lopez JJ (2013) A steered response power iterative method for high-accuracy acoustic source location. J Acoust Soc Am 134(4):2627–2630Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93(2):216–231 (special issue on “Program generation, optimization, and platform adaptation”)NVIDIA cuFFT library user’s guide (2018). https://docs.nvidia.com/cuda/pdf/CUFFT_Library.pdf . Accessed 21 May 2018OpenCL fast Fourier transforms. http://clmathlibraries.github.io/clFFT . Accessed 21 May 2018Scarpino M (2012) OpenCL in action: how to accelerate graphics and computation. Mannin

    Metodología de internacionalización de material docente basada en el uso de Markdown y Pandoc

    Get PDF
    La internacionalización de la docencia ofrece grandes oportunidades para la Universidad, pero también plantea retos significativos para estudiantes y profesores. En particular, la creación y mantenimiento efectivo del material docente de una asignatura impartida simultáneamente en varios idiomas y con alto grado de coordinación entre los distintos grupos de la misma (p.ej., examen final/prácticas comunes para todos los estudiantes) puede suponer un importante desafío para los profesores. Para hacer frente a este problema, hemos diseñado una estrategia específica para la creación y gestión de material docente en dual (p.ej., inglés-español), y desarrollado un conjunto de herramientas multiplataforma para ponerla en práctica. La idea general es mantener en un mismo fichero de texto el contenido del documento que se desee construir en ambos idiomas, proporcionando justo detrás de cada párrafo y título en uno de los idiomas su traducción al otro idioma, empleando delimitadores especiales. Para crear estos documentos duales se emplea Markdown, un lenguaje de marcado ligero, que dada su sencillez y versatilidad está teniendo una rápida adopción por un amplio espectro de profesionales: desde escritores de novelas o periodistas, hasta administradores de sitios web. A partir de los documentos duales creados con Markdown, es posible generar automáticamente el documento final para cada idioma en el formato deseado que se pondrá a disposición de los estudiantes. Para esta tarea, nos basamos en el uso de la herramienta Pandoc, que permite realizar la conversión de documentos Markdown a una gran cantidad de formatos, como PDF, docx (Microsoft Word), EPUB (libro electrónico) o HTML. Como parte de nuestro proyecto, hemos creado extensiones de Pandoc para permitir la creación de documentos duales en Markdown y para aumentar la expresividad de este lenguaje con construcciones comunmente utilizadas en documentos de carácter docente

    Matrix computations on graphics processors and clusters of gpus

    Get PDF
    Las arquitecturas destinadas a computación de altas prestaciones (HPC) basadas en aceleradores gráficos (GPUs) se han convertido en los últimos años en una alternativa extendida que combina altas prestaciones y un bajo coste de adquisición. Pese a que la facilidad de programación de este tipo de arquitecturas ha mejorado en las últimas generaciones, todavía necesitan grandes esfuerzos si es necesario optimizar las rutinas a implementar. Por otra parte, las rutinas de álgebra lineal aparecen en un amplio abanico de aplicaciones científico técnicas. Su optimización resulta clave a la hora de obtener elevadas prestaciones en aplicaciones reales existentes en diversos ámbitos. El objetivo de la presente tesis es diseñar, implementar y evaluar estrategias de programación que permitan elevar las prestaciones de rutinas de álgebra lineal comunes sobre arquitecturas basadas en un procesador gráfico, múltiples procesadores gráficos y clústers de GPUs, adoptando un enfoque de alto nivel que facilita el desarrollo de las mismas. Los resultados experimentales obtenidos demuestran la viabilidad de este enfoque, consiguiendo destacables aceleraciones sobre los tres tipos de arquitecturas seleccionadas

    Robust motion estimation on a low-power multi-core DSP

    Get PDF
    Medical imaging has become an absolutely essential diagnostic tool for clinical practices; at present, pathologies can be detected with an earliness never before known. Its use has not only been relegated to the field of radiology but also, increasingly, to computer-based imaging processes prior to surgery. Motion analysis, in particular, plays an important role in analyzing activities or behaviors of live objects in medicine. This short paper presents several low-cost hardware implementation approaches for the new generation of tablets and/or smartphones for estimating motion compensation and segmentation in medical images. These systems have been optimized for breast cancer diagnosis using magnetic resonance imaging technology with several advantages over traditional X-ray mammography, for example, obtaining patient information during a short period. This paper also addresses the challenge of offering a medical tool that runs on widespread portable devices, both on tablets and/or smartphones to aid in patient diagnostics

    Energy efficiency optimization of task-parallel codes on asymmetric architectures

    No full text
    We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution points, depending on the internal status of the scheduler. Experimental results on an asymmetric SoC (Exynos 5422) and for a specific operation (Cholesky factorization) reveal gains up to 29% in terms of energy efficiency and considerable reductions in average powerDepto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

    Dynamic power budget redistribution under a power cap on multi-application environments

    No full text
    We present a two-level implementation of an infrastructure that allows performance maximization under a power-cap on multi-application environments with minimal user intervention. At the application level, we integrate bar (Power Budget-Aware Runtime Scheduler) into existing task-based runtimes, e.g. OpenMP; bar implements combined software/hardware techniques (thread malleability and DVFS) to maximize the application performance without violating a granted power budget. At a higher level, we introduce barman (Power Budget-Aware Resource Manager), a system-wide software able to manage resources globally, gathering power needs of registered applications, and redistributing the available overall power budget across them. The combination and co-operative operation of both pieces of software yields performance and energy efficiency improvements on environments in which power capping is established globally, and also granted asymmetrically to different co-existing applications. This behaviour is demonstrated to be stable under different workloads (a selection of task-based scientific applications and PARSEC benchmarks are tested) and different levels of power capping.MCINComunidad de MadridDepto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

    Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding

    No full text
    The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion become mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24× compared with alternative approaches considering homogeneous QoS requests.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEpu

    Applying game-learning environments to power capping scenarios via reinforcement learning

    Get PDF
    Research in deep learning for video game playing has received much attention and provided very relevant results in the last years. Frameworks and libraries have been developed to ease game playing research leveraging Reinforcement Learning techniques. In this paper, we propose to use two of them (RLLIB and GYM) in a very different scenario, such as learning to apply resource management policies in a multi-core server, specifically, we leverage the facilities of both frameworks coupled to derive policies for power-capping. Using RLlib and Gym enables implementing different resource management policies in a simple and fast way and, as they are based on neural networks, guarantees the efficiency in the solution, and the use of hardware accelerators for both training and inference. The results demonstrate that game-learning environments provide an effective support to cast a completely different scenario, and open new research avenues in the field of resource management using reinforcement learning techniques with minimal development effort

    Integración de los servicios para.TI@UCM en una plataforma de e-learning similar al Campus Virtual

    Get PDF
    La integración de los servicios para.TI@UCM en nuestra Universidad hace plantearnos nuevas metodologías docentes y de evaluación en el proceso de enseñanza-aprendizaje. Este proyecto surge como continuación del proyecto PIMCD UCM 138 (2013) titulado “Uso de los servicios para.TI@UCM para integrar tareas docentes y fomentar el aprendizaje activo y colaborativo de los alumnos” desarrollado por este mismo grupo de profesores. Como resultado de este proyecto se han elaborado una serie de tutoriales sobre el uso de las aplicaciones de Google en el ámbito de las tareas docentes como herramientas útiles para fomentar el aprendizaje de los alumnos. Partiendo del nuevo marco docente creado en el PIMCD UCM 138 (2013) donde tanto el material docente como las actividades propuestas a los alumnos se desarrollan en la nube, el objetivo de este nuevo proyecto es conseguir integrar todas las aplicaciones necesarias para un desarrollo completo de la actividad docente en la nube (para.TI@UCM), tanto las propietarias de Google como las desarrolladas por terceros. Nuestro objetivo es intentar crear una plataforma de e-learning similar al Campus Virtual. Para realizar esta tarea será necesario realizar un estudio, por un lado, de las funcionalidades que ofrece el Campus Virtual, y por otro, de cuáles de estas funcionalidades están disponibles en los recursos para.TI@UCM. El siguiente paso sería plantear cómo se pueden implementar las funcionalidades buscadas y no encontradas en para.TI@UCM usando como base las aplicaciones de Google
    corecore