4 research outputs found

    StarPU : un support exécutif unifié pour les architectures multicoeurs hétérogènes

    Get PDF
    National audienceEn conjonction avec les processeurs multicoeurs, désormais omniprésents, l'utilisation d'architectures spécialisées telles que les processeurs graphiques ou le Cell est une tendance forte du calcul haute performance. Atteindre les performances théoriques de ces architectures est un objectif difficile. Si de nombreux efforts ont d'ores et déjà été portés sur les accélérateurs, l'utilisation de toutes les ressources de calcul, simultanément, reste un véritable défi. Nous avons donc conçu StarPU, un support exécutif original qui fournit un modèle d'exécution unifié afin d'exploiter l'intégralité de la puissance de calcul tout en s'affranchissant des difficultés liées à la gestion des données. StarPU offre par ailleurs la possibilité de concevoir facilement des stratégies d'ordonnancement portables et efficaces. Nous avons mis en oeuvre quelques stratégies d'ordonnancement sélectionnables de manière transparente lors de l'exécution. Cela nous a permis d'étudier l'impact de l'ordonnancement sur quelques algorithmes d'algèbre linéaire. Au-delà d'une réduction substantielle des temps d'exécution, StarPU obtient des accélérations super-linéaires grâce à sa capacité à tirer un réel avantage des spécificités des machines hétérogènes

    ACOTES project: Advanced compiler technologies for embedded streaming

    Get PDF
    Streaming applications are built of data-driven, computational components, consuming and producing unbounded data streams. Streaming oriented systems have become dominant in a wide range of domains, including embedded applications and DSPs. However, programming efficiently for streaming architectures is a challenging task, having to carefully partition the computation and map it to processes in a way that best matches the underlying streaming architecture, taking into account the distributed resources (memory, processing, real-time requirements) and communication overheads (processing and delay). These challenges have led to a number of suggested solutions, whose goal is to improve the programmer’s productivity in developing applications that process massive streams of data on programmable, parallel embedded architectures. StreamIt is one such example. Another more recent approach is that developed by the ACOTES project (Advanced Compiler Technologies for Embedded Streaming). The ACOTES approach for streaming applications consists of compiler-assisted mapping of streaming tasks to highly parallel systems in order to maximize cost-effectiveness, both in terms of energy and in terms of design effort. The analysis and transformation techniques automate large parts of the partitioning and mapping process, based on the properties of the application domain, on the quantitative information about the target systems, and on programmer directives. This paper presents the outcomes of the ACOTES project, a 3-year collaborative work of industrial (NXP, ST, IBM, Silicon Hive, NOKIA) and academic (UPC, INRIA, MINES ParisTech) partners, and advocates the use of Advanced Compiler Technologies that we developed to support Embedded Streaming.Peer ReviewedPostprint (published version

    StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines

    Get PDF
    Multicore machines equipped with accelerators are becoming increasingly popular. The TOP500-leading RoadRunner machine is probably the most famous example of a parallel computer mixing IBM Cell Broadband Engines and AMD opteron processors. Other architectures, featuring GPU accelerators, are expected to appear in the near future. To fully tap into the potential of these hybrid machines, pure offloading approaches, in which the main core of the application runs on regular processors and offloads specific parts on accelerators, are not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. To face this challenge, we propose a new runtime system capable of scheduling tasks over heterogeneous, accelerator-based machines. Our system features a software virtual shared memory that provides a weak consistency model. The system keeps track of data copies within accelerator embedded-memories and features a data-prefetching engine. Such facilities, together with a database of self-tuned per-task performance models, can be used to greatly improve the quality of scheduling policies in this context. We demonstrate the relevance of our approach by benchmarking various parallel numerical kernel implementations over our runtime system. We obtain significant speedups and a very high efficiency on various typical workloads over multicore machines equipped with multiple accelerators

    Mapping and synchronizing streaming applications on Cell processors

    No full text
    International audienceDeveloping streaming applications on heterogenous multi-processor architectures like the Cell is difficult. Currently, application developers need to know about hardware details to deal with issues like scheduling, memory management and communication/synchronization. Worse, with multiple alternatives for communication available, developers spend significant time picking the most appropriate one. A poor choice often results in bad performance. With Cell- Space, we shield users from hardware details without compromising performance. Its runtime is based on an evaluation of the different communication primitives. In Cell-Space, developers specify a streaming application as a data flow graph of interacting components. Both task- and data-parallelism are easily expressed and advanced features such as dynamic reconfiguration are fully supported. Beneath a simple interface we include a slew of optimizations not present in other Cell run time environments. We demonstrate the impact of these optimizations and show that Cell-Space applications can efficiently exploit the resources offered by the Cell