341 research outputs found

    The AXIOM software layers

    Get PDF
    AXIOM project aims at developing a heterogeneous computing board (SMP-FPGA).The Software Layers developed at the AXIOM project are explained.OmpSs provides an easy way to execute heterogeneous codes in multiple cores. People and objects will soon share the same digital network for information exchange in a world named as the age of the cyber-physical systems. The general expectation is that people and systems will interact in real-time. This poses pressure onto systems design to support increasing demands on computational power, while keeping a low power envelop. Additionally, modular scaling and easy programmability are also important to ensure these systems to become widespread. The whole set of expectations impose scientific and technological challenges that need to be properly addressed.The AXIOM project (Agile, eXtensible, fast I/O Module) will research new hardware/software architectures for cyber-physical systems to meet such expectations. The technical approach aims at solving fundamental problems to enable easy programmability of heterogeneous multi-core multi-board systems. AXIOM proposes the use of the task-based OmpSs programming model, leveraging low-level communication interfaces provided by the hardware. Modular scalability will be possible thanks to a fast interconnect embedded into each module. To this aim, an innovative ARM and FPGA-based board will be designed, with enhanced capabilities for interfacing with the physical world. Its effectiveness will be demonstrated with key scenarios such as Smart Video-Surveillance and Smart Living/Home (domotics).Peer ReviewedPostprint (author's final draft

    Adaptive structured parallelism

    Get PDF
    Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way in which structured sequential programs are developed, using well-defined constructs. Skeletons provide top-down design composition and control inheritance throughout the program structure. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique which allows the conceptual description of parallel programs whilst fostering platform independence and algorithm abstraction. By decoupling the algorithm specification from machine-dependent structural considerations, structured parallelism allows programmers to code programs regardless of how the computation and communications will be executed in the system platform.Meanwhile, large non-dedicated multiprocessing systems have long posed a challenge to known distributed systems programming techniques as a result of the inherent heterogeneity and dynamism of their resources. Scant research has been devoted to the use of structural information provided by skeletons in adaptively improving program performance, based on resource utilisation. This thesis presents a methodology to improve skeletal parallel programming in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed ASPara: Adaptive Structured Parallelism. ASPara is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at execution

    Performance analysis and tuning in multicore environments

    Get PDF
    Performance analysis is the task of monitor the behavior of a program execution. The main goal is to find out the possible adjustments that might be done in order improve the performance. To be able to get that improvement it is necessary to find the different causes of overhead. Nowadays we are already in the multicore era, but there is a gap between the level of development of the two main divisions of multicore technology (hardware and software). When we talk about multicore we are also speaking of shared memory systems, on this master thesis we talk about the issues involved on the performance analysis and tuning of applications running specifically in a shared Memory system. We move one step ahead to take the performance analysis to another level by analyzing the applications structure and patterns. We also present some tools specifically addressed to the performance analysis of OpenMP multithread application. At the end we present the results of some experiments performed with a set of OpenMP scientific application.Análisis de rendimiento es el área de estudio encargada de monitorizar el comportamiento de la ejecución de programas informáticos. El principal objetivo es encontrar los posibles ajustes que serán necesarios para mejorar el rendimiento. Para poder obtener esa mejora es necesario encontrar las principales causas de overhead. Actualmente estamos sumergidos en la era multicore, pero existe una brecha entre el nivel de desarrollo de sus dos principales divisiones (hardware y software). Cuando hablamos de multicore también estamos hablando de sistemas de memoria compartida. Nosotros damos un paso más al abordar el análisis de rendimiento a otro nivel por medio del estudio de la estructura de las aplicaciones y sus patrones. También presentamos herramientas de análisis de aplicaciones que son específicas para el análisis de rendimiento de aplicaciones paralelas desarrolladas con OpenMP. Al final presentamos los resultados de algunos experimentos realizados con un grupo de aplicaciones científicas desarrolladas bajo este modelo de programación.L'Anàlisi de rendiment és l'àrea d'estudi encarregada de monitorar el comportament de l'execució de programes informàtics. El principal objectiu és trobar els possibles ajustaments que seran necessaris per a millorar el rendiment. Per a poder obtenir aquesta millora és necessari trobar les principals causes de l'overhead (excessos de computació no productiva). Actualment estem immersos en l'era multicore, però existeix una rasa entre el nivell de desenvolupament de les seves dues principals divisions (maquinari i programari). Quan parlam de multicore, també estem parlant de sistemes de memòria compartida. Nosaltres donem un pas més per a abordar l'anàlisi de rendiment en un altre nivell per mitjà de l'estudi de l'estructura de les aplicacions i els seus patrons. També presentem eines d'anàlisis d'aplicacions que són específiques per a l'anàlisi de rendiment d'aplicacions paral·leles desenvolupades amb OpenMP. Al final, presentem els resultats d'alguns experiments realitzats amb un grup d'aplicacions científiques desenvolupades sota aquest model de programació

    TABARNAC: Tools for Analyzing Behavior of Applications Running on NUMA Architecture

    Get PDF
    In modern parallel architectures, memory accesses represent a commonbottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMAmachines, as the access time to data depends on its location inthe memory. Many efforts were made todevelop adaptive tools to improve memory accesses at the runtime by optimizingthe mapping of data and threads to NUMA nodes. However, theses tools are notable to change the memory access pattern of the original application,therefore a code written without considering memory performance mightnot benefit from them. Moreover, automatic mapping tools take time to convergetowards the best mapping, losing optimization opportunities. Adeeper understanding of the memory behavior can help optimizing it,removing the need for runtime analysis.In this paper, we present TABARNAC, a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures.TABARNAC provides a new visualization of the memory access behavior, focusing on thedistribution of accesses by thread and by structure. Such visualization allows thedeveloper to easily understand why performance issues occur and how to fix them.Using TABARNAC, we explain why some applications do not benefit from dataand thread mapping. Moreover, we propose several code modifications toimprove the memory access behavior of several parallel applications.Les accès mémoire représentent une source de problème de performance fréquenteavec les architectures parallèle moderne. Ainsi optimiser la manière dont lesapplications accèdent à la mémoire est un moyen efficace d'améliorer laperformance et la consommation d'énergie. Les accès mémoire prennent d'autantplus d'important avec les machines NUMA où le temps d'accès à une donnéedépend de sa localisation dans la mémoire. De nombreuse études ont proposéesdes outils adaptatif pour améliorer les accès mémoire en temps réel, cesoutils opèrent en changeant le placement des données et des thread sur lesnœuds NUMA. Cependant ces outils n'ont pas la possibilité de changer la façondont l'application accède à la mémoire. De ce fait un code développé sansprendre en compte les performances des accès mémoire pourrait ne pas en tirerparti. De plus les outils de placement automatique ont besoin de temps pourconverger vers le meilleur placement, perdant des opportunités d'optimisation.Mieux comprendre le comportement mémoire peut aider à l'optimiser et supprimerle besoin d'optimisation en temps réel.Cette étude présente TABARNAC un outil pour analyser le comportement mémoired'application parallèles s'exécutant sur architecture NUMA. TABARNAC offreune nouvelle forme de visualisation du comportement mémoire mettant l'accentsur la distribution des accès entre les thread et par structure de données. Cetype de visualisations permettent de comprendre facilement pourquoi lesproblèmes de performances apparaissent et comment les résoudre. En utilisantTABARNAC, nous expliquons pourquoi certaines applications ne tirent pas partid'outils placement de donnée et de thread. De plus nous proposons plusieursmodification de code permettant d'améliorer le comportement mémoire de plusieursapplications parallèles
    corecore