168 research outputs found

    Pipeline template for streaming applications on heterogeneous chips

    Get PDF
    We address the problem of providing support for executing single streaming applications implemented as a pipeline of stages that run on heterogeneous chips comprised of several cores and one on-chip GPU. In this paper, we mainly focus on the API that allows the user to specify the type of parallelism exploited by each pipeline stage running on the multicore CPU, the mapping of the pipeline stages to the devices (GPU or CPU), and the number of active threads. We use a real streaming application as a case of study to illustrate the experimental results that can be obtained with this API. With this example, we evaluate how the different parameter values affect the performance and energy efficiency of a heterogenous on-chip processor (Exynos 5 Octa) that has three different computational cores: a GPU, an ARM Cortex-A15 quad-core, and an ARM Cortex-A7 quad-core.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Proyecto de Excelencia de la Junta de Andalucía P11-TIC-08144

    Solving Large-Scale Markov Decision Processes on Low-Power Heterogeneous Platforms

    Get PDF
    Markov Decision Processes (MDPs) provide a framework for a machine to act autonomously and intelligently in environments where the effects of its actions are not deterministic. MDPs have numerous applications. We focus on practical applications for decision making, such as autonomous driving and service robotics, that have to run on mobile platforms with scarce computing and power resources. In our study, we use Value Iteration to solve MDPs, a core method of the paradigm to find optimal sequences of actions, which is well known for its high computational cost. In order to solve these computationally complex problems efficiently in platforms with stringent power consumption constraints, high-performance accelerator hardware and parallelised software come to the rescue. We introduce a generalisable approach to implement practical applications for decision making, such as autonomous driving on mobile and embedded low-power heterogeneous SoC platforms that integrate an accelerator (GPU) with a multicore. We evaluate three scheduling strategies that enable concurrent execution and efficient use of resources on a variety of SoCs embedding a multicore CPU and integrated GPU, namely Oracle, Dynamic, and LogFit. We compare these strategies for solving an MDP modelling the use-case of autonomous robot navigation in indoor environments on four representative platforms for mobile decision-making applications with a power use ranging from 4 to 65 Watts. We provide a rigorous analysis of the results to better understand their behaviour depending on the MDP size and the computing platform. Our experimental results show that by using CPU-GPU heterogeneous strategies, the computation time and energy required are considerably reduced with respect to multicore implementation, regardless of the computational platform.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This work was partially supported by the Spanish project TIN 2016-80920-R

    Towards the understanding of the meson spectra

    Get PDF
    We present a quark-quark interaction for the complete study of the meson spectra, from the light to the heavy sector. We compare the quark model predictions against well-established q (q) over bar experimental data. This allows to identify discrepancies between quark model results and experiment that may signal physics beyond conventional hadron spectroscopy

    Prácticas de ensamblador basadas en Raspberry Pi

    Get PDF
    Este trabajo se enmarca dentro del Proyecto de Innovación Educativa PIE13-082, “Motivando al alumno de ingeniería mediante la plataforma Raspberry Pi” cuyo principal objetivo es aumentar el grado de motivación del alumno que cursa asignaturas impartidas por el Departamento de Arquitectura de Computadores. La estrategia propuesta se apoya en el hecho de que muchos alumnos de Ingeniería perciben que las asignaturas de la carrera están alejadas de su realidad cotidiana, y que por ello, pierden cierto atractivo. Sin embargo, bastantes de estos alumnos han comprado o piensan comprar un minicomputador Raspberry Pi que se caracteriza por proporcionar una gran funcionalidad, gracias a estar basado en un procesador y Sistema Operativo de referencia en los dispositivos móviles. En este proyecto proponemos aprovechar el interés que los alumnos ya demuestran por la plataforma Raspberry Pi, para ponerlo a trabajar en pro del siguiente objetivo docente: facilitar el estudio de conceptos y técnicas impartidas en varias asignaturas del Departamento. Más concretamente, el principal objetivo de este trabajo es la creación de un conjunto de prácticas enfocadas al aprendizaje de la programación en ensamblador, en concreto del ARMv6 que es el procesador de la plataforma que se va a utilizar para el desarrollo de las prácticas, así como al manejo a bajo nivel de las interrupciones y la entrada/salida en dicho procesador. La presente memoria está dividida cinco capítulos y cuatro apéndices. De los 5 capítulos, el primero es introductorio. Los dos siguientes se centran en la programación de ejecutables en Linux, tratando las estructuras de control en el capítulo 2 y las subrutinas (funciones) en el capítulo 3. Los dos últimos capítulos muestran la programación en Bare Metal, explicando el subsistema de entrada/salida (puertos de entrada/salida y temporizadores) de la plataforma Raspberry Pi y su manejo a bajo nivel en el capítulo 4 y las interrupciones en el capítulo 5. En los apéndices hemos añadido aspectos laterales pero de suficiente relevancia como para ser considerados en la memoria, como el apendice A que explica el funcionamiento de la macro ADDEXC, el apéndice B que muestra todos los detalles de la placa auxiliar, el apéndice C que nos enseña a agilizar la carga de programas Bare Metal y por último tenemos el apéndice D, que profundiza en aspectos del GPIO como las resistencias programables

    Adaptive Partition Strategies for Loop Parallelism in Heterogeneous Architectures

    Get PDF
    Este trabajo describe nuestra contribución para la ejecución de bucles paralelos en arquitecturas multi-core/multi-GPU de forma que la carga computacional se distribuya de forma balanceada entre todas las unidades de computación.This paper explores the possibility of efficiently using multicores in conjunction with multiple GPU accelerators under a parallel task programming paradigm. In particular, we address the challenge of extending a parallel_for template to allow its exploitation on heterogeneous systems. The extension is based on a two-stages pipeline engine which is responsible for partitioning and scheduling the chunks into the computational resources. Under this engine, we propose a dynamic scheduling strategy coupled with an adaptive partitioning heuristic that resizes chunks to prevent underutilization and load unbalance of CPUs and GPUs. In this paper we introduce the adaptive partitioning heuristic which is derived from an analytical model that minimizes the load unbalance while maximizes the throughput in the system. Using two benchmarks we evaluate the overhead introduced by our template extensions finding that it is negligible. We also evaluate the efficiency of our adaptive partitioning strategies and compared them with related work.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. TIN2010-16144, P08-TIC-3500, P11-TIC-0814

    Describing non-QQ candidates

    Get PDF
    Despite the apparent simplicity of meson spectroscopy there are some states which cannot be accommodated in the usual q (q) over bar structure. Among them there are either exotic states as the X(1600) or the recently measured charm states D-sJ(*) and X(3872) and some of the light scalar mesons. In this work we present a possible description of these states in terms of tetraquarks

    Postthoracotomy Ipsilateral Shoulder Pain: A Literature Review on Characteristics and Treatment

    Get PDF
    Context. Postthoracotomy Ipsilateral Shoulder Pain (IPS) is a common and sometimes intractable pain syndrome. IPS is different from chest wall pain in type, origin, and treatments. Various treatments are suggested or applied for it but none of them is regarded as popular accepted effective one. Objectives. To review data and collect all present experiences about postthoracotomy IPS and its management and suggest future research directions. Methods. Search in PubMed database and additional search for specific topics and review them to retrieve relevant articles as data source in a narrative review article. Results. Even in the presence of effective epidural analgesia, ISP is a common cause of severe postthoracotomy pain. The phrenic nerve has an important role in the physiopathology of postthoracotomy ISP. Different treatments have been applied or suggested. Controlling the afferent nociceptive signals conveyed by the phrenic nerve at various levels—from peripheral branches on the diaphragm to its entrance in the cervical spine—could be of therapeutic value. Despite potential concerns about safety, intrapleural or phrenic nerve blocks are tolerated well, at least in a selected group of patient. Conclusion. Further researches could be directed on selective sensory block and motor function preservation of the phrenic nerve. However, the safety and efficacy of temporary loss of phrenic nerve function and intrapleural local anesthetics should be assessed

    Reducing overheads of dynamic scheduling on heterogeneous chips

    Get PDF
    In recent processor development, we have witnessed the integration of GPU and CPUs into a single chip. The result of this integration is a reduction of the data communication overheads. This enables an efficient collaboration of both devices in the execution of parallel workloads. In this work, we focus on the problem of efficiently scheduling chunks of iterations of parallel loops among the computing devices on the chip (the GPU and the CPU cores) in the context of irregular applications. In particular, we analyze the sources of overhead that the host thread experiments when a chunk of iterations is offloaded to the GPU while other threads are executing concurrently other chunks on the CPU cores. We carefully study these overheads on different processor architectures and operating systems using Barnes Hut as a study case representative of irregular applications. We also propose a set of optimizations to mitigate the overheads that arise in presence of oversubscription and take advantage of the different features of the heterogeneous architectures. Thanks to these optimizations we reduce Energy-Delay Product (EDP) by 18% and 84% on Intel Ivy Bridge and Haswell architectures, respectively, and by 57% on the Exynos big.LITTLE.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Bohm potential is real and its effects are measurable

    Full text link
    We analyze Bohm's potential effects both in the realms of Quantum Mechanics and Optics, as well as in the study of other physical phenomena described in terms of classical and quantum wave equations. We approach this subject by using theoretical arguments as well as experimental evidence. We find that the effects produced by Bohm's potential are both theoretically responsible for the early success of Quantum Mechanics correctly describing atomic and nuclear phenomena and, more recently, by confirming surprising accelerating behavior of free waves and particles experimentally, for instance.Comment: 4 pages, no figures, Accepted in Opti

    Propagation of light in linear and quadratic GRIN media: The Bohm potential

    Full text link
    It is shown that field propagation in linear and quadratic gradient-index (GRIN) media obeys the same rules of free propagation in the sense that a field propagating in free space has a (mathematical) form that may be {\it exported} to those particular GRIN media. The Bohm potential is introduced in order to explain the reason of such behavior: it changes the dynamics by modifying the original potential. The concrete cases of two different initials conditions for each potential are analyzed
    corecore