20,641 research outputs found

    A Domain Specific Approach to High Performance Heterogeneous Computing

    Full text link
    Users of heterogeneous computing systems face two problems: firstly, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed computing platforms efficiently. A domain specific approach addresses both of these problems. By considering a subset of operations or functions, models of the observable characteristics or domain metrics may be formulated in advance, and populated at run-time for task instances. These metric models can then be used to express the allocation of work as a constrained integer program, which can be solved using heuristics, machine learning or Mixed Integer Linear Programming (MILP) frameworks. These claims are illustrated using the example domain of derivatives pricing in computational finance, with the domain metrics of workload latency or makespan and pricing accuracy. For a large, varied workload of 128 Black-Scholes and Heston model-based option pricing tasks, running upon a diverse array of 16 Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both the makespan and accuracy are generally within 10% of the run-time performance. When these models are used as inputs to machine learning and MILP-based workload allocation approaches, a latency improvement of up to 24 and 270 times over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio

    Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes

    Get PDF
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and StarPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its native internal scheduler, PaRSEC, and StarPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014

    Internet of robotic things : converging sensing/actuating, hypoconnectivity, artificial intelligence and IoT Platforms

    Get PDF
    The Internet of Things (IoT) concept is evolving rapidly and influencing newdevelopments in various application domains, such as the Internet of MobileThings (IoMT), Autonomous Internet of Things (A-IoT), Autonomous Systemof Things (ASoT), Internet of Autonomous Things (IoAT), Internetof Things Clouds (IoT-C) and the Internet of Robotic Things (IoRT) etc.that are progressing/advancing by using IoT technology. The IoT influencerepresents new development and deployment challenges in different areassuch as seamless platform integration, context based cognitive network integration,new mobile sensor/actuator network paradigms, things identification(addressing, naming in IoT) and dynamic things discoverability and manyothers. The IoRT represents new convergence challenges and their need to be addressed, in one side the programmability and the communication ofmultiple heterogeneous mobile/autonomous/robotic things for cooperating,their coordination, configuration, exchange of information, security, safetyand protection. Developments in IoT heterogeneous parallel processing/communication and dynamic systems based on parallelism and concurrencyrequire new ideas for integrating the intelligent “devices”, collaborativerobots (COBOTS), into IoT applications. Dynamic maintainability, selfhealing,self-repair of resources, changing resource state, (re-) configurationand context based IoT systems for service implementation and integrationwith IoT network service composition are of paramount importance whennew “cognitive devices” are becoming active participants in IoT applications.This chapter aims to be an overview of the IoRT concept, technologies,architectures and applications and to provide a comprehensive coverage offuture challenges, developments and applications

    Pipelining the Fast Multipole Method over a Runtime System

    Get PDF
    Fast Multipole Methods (FMM) are a fundamental operation for the simulation of many physical problems. The high performance design of such methods usually requires to carefully tune the algorithm for both the targeted physics and the hardware. In this paper, we propose a new approach that achieves high performance across architectures. Our method consists of expressing the FMM algorithm as a task flow and employing a state-of-the-art runtime system, StarPU, in order to process the tasks on the different processing units. We carefully design the task flow, the mathematical operators, their Central Processing Unit (CPU) and Graphics Processing Unit (GPU) implementations, as well as scheduling schemes. We compute potentials and forces of 200 million particles in 48.7 seconds on a homogeneous 160 cores SGI Altix UV 100 and of 38 million particles in 13.34 seconds on a heterogeneous 12 cores Intel Nehalem processor enhanced with 3 Nvidia M2090 Fermi GPUs.Comment: No. RR-7981 (2012

    Toward sustainable data centers: a comprehensive energy management strategy

    Get PDF
    Data centers are major contributors to the emission of carbon dioxide to the atmosphere, and this contribution is expected to increase in the following years. This has encouraged the development of techniques to reduce the energy consumption and the environmental footprint of data centers. Whereas some of these techniques have succeeded to reduce the energy consumption of the hardware equipment of data centers (including IT, cooling, and power supply systems), we claim that sustainable data centers will be only possible if the problem is faced by means of a holistic approach that includes not only the aforementioned techniques but also intelligent and unifying solutions that enable a synergistic and energy-aware management of data centers. In this paper, we propose a comprehensive strategy to reduce the carbon footprint of data centers that uses the energy as a driver of their management procedures. In addition, we present a holistic management architecture for sustainable data centers that implements the aforementioned strategy, and we propose design guidelines to accomplish each step of the proposed strategy, referring to related achievements and enumerating the main challenges that must be still solved.Peer ReviewedPostprint (author's final draft
    • …
    corecore