106 research outputs found

    Power-aware Manhattan routing on chip multiprocessors

    Get PDF
    Nous nous intéressons au routage des communications dans un processeur multi-cœur (CMP). Le but est de trouver un routage valide, c'est-à-dire un routage dans lequel la quantité de données routée entre deux cœurs voisins ne dépasse pas la bande passante maximale, et tel que la puissance dissipée dans les communications est minimale. Nous nous positionnons au niveau système : nous supposons que des applications, sous forme de graphes de tâches, s'exécutent sur le CMP, chaque tâche étant déjà assignée à un cœur. Nous avons donc un ensemble de communications à router entre les cœurs. Nous utilisons un modèle classique, dans lequel la puissance dissipée par un lien de communication est la somme d'une partie statique et d'une partie dynamique, cette dernière dépendant de la fréquence du lien. Cette fréquence est ajustable et proportionnelle à la bande passante. La politique la plus utilisée est le routage XY : chaque communication est en- voyée horizontalement, puis verticalement. Cependant si nous nous autorisons à utiliser les chemins de Manhattan entre la source et la destination, la puissance dissipée peut être considérablement réduite. De plus, il est parfois possible de trouver une solution, alors qu'il n'en existait pas avec un routage XY. Dans ce papier, nous comparons le routage XY et le routage via des chemins de Manhattan, aussi bien d'un point de vue théorique que d'un point de vue pratique. Nous considérons deux variantes du routage par chemins de Manhattan : dans un routage à chemin unique, un seul chemin peut être utilisé pour chaque communication, tandis que le routage à chemin multiples nous permet d'éclater une communication et de lui faire emprunter plusieurs routes. Nous établissons la NP-complétude du problème consistant à trouver un routage Manhattan qui minimise la puissance dissipée, exhibons la borne supérieure minimale du ratio entre la puissance dissipée par un routage XY et celle dissipée par un routage Manhattan, et pour terminer, nous effectuons des simulations pour étudier les performances de nos heuristiques de routage Manhattan

    Scheduling techniques to improve the worst-case execution time of real-time parallel applications on heterogeneous platforms

    Get PDF
    The key to providing high performance and energy-efficient execution for hard real-time applications is the time predictable and efficient usage of heterogeneous multiprocessors. However, schedulability analysis of parallel applications executed on unrelated heterogeneous multiprocessors is challenging and has not been investigated adequately by earlier works. The unrelated model is suitable to represent many of the multiprocessor platforms available today because a task (i.e., sequential code) may exhibit a different work-case-execution-time (WCET) on each type of processor on an unrelated heterogeneous multiprocessors platform. A parallel application can be realistically modeled as a directed acyclic graph (DAG), where the nodes are sequential tasks and the edges are dependencies among the tasks. This thesis considers a sporadic DAG model which is used broadly to analyze and verify the real-time requirements of parallel applications. A global work-conserving scheduler can efficiently utilize an unrelated platform by executing the tasks of a DAG on different processor types. However, it is challenging to compute an upper bound on the worst-case schedule length of the DAG, called makespan, which is used to verify whether the deadline of a DAG is met or not. There are two main challenges. First, because of the heterogeneity of the processors, the WCET for each task of the DAG depends on which processor the task is executing on during actual runtime. Second, timing anomalies are the main obstacle to compute the makespan even for the simpler case when all the processors are of the same type, i.e., homogeneous multiprocessors. To that end, this thesis addresses the following problem: How we can schedule multiple sporadic DAGs on unrelated multiprocessors such that all the DAGs meet their deadlines. Initially, the thesis focuses on homogeneous multiprocessors that is a special case of unrelated multiprocessors to understand and tackle the main challenge of timing anomalies. A novel timing-anomaly-free scheduler is proposed which can be used to compute the makespan of a DAG just by simulating the execution of the tasks based on this proposed scheduler. A set of representative task-based parallel OpenMP applications from the BOTS benchmark suite are modeled as DAGs to investigate the timing behavior of real-world applications. A simulation framework is developed to evaluate the proposed method. Furthermore, the thesis targets unrelated multiprocessors and proposes a global scheduler to execute the tasks of a single DAG to an unrelated multiprocessors platform. Based on the proposed scheduler, methods to compute the makespan of a single DAG are introduced. A set of representative parallel applications from the BOTS benchmark suite are modeled as DAGs that execute on unrelated multiprocessors. Furthermore, synthetic DAGs are generated to examine additional structures of parallel applications and various platform capabilities. A simulation framework that simulates the execution of the tasks of a DAG on an unrelated multiprocessor platform is introduced to assess the effectiveness of the proposed makespan computations. Finally, based on the makespan computation of a single DAG this thesis presents the design and schedulability analysis of global and federated scheduling of sporadic DAGs that execute on unrelated multiprocessors

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    On the design of multimedia architectures : proceedings of a one-day workshop, Eindhoven, December 18, 2003

    Get PDF

    Real-Time Application Mapping for Many-Cores Using a Limited Migrative Model

    Get PDF
    Many-core platforms are an emerging technology in the real-time embedded domain. These devices offer various options for power savings, cost reductions and contribute to the overall system flexibility, however, issues such as unpredictability, scalability and analysis pessimism are serious challenges to their integration into the aforementioned area. The focus of this work is on many-core platforms using a limited migrative model (LMM). LMM is an approach based on the fundamental concepts of the multi-kernel paradigm, which is a promising step towards scalable and predictable many-cores. In this work, we formulate the problem of real-time application mapping on a many-core platform using LMM, and propose a three-stage method to solve it. An extended version of the existing analysis is used to assure that derived mappings (i) guarantee the fulfilment of timing constraints posed on worst-case communication delays of individual applications, and (ii) provide an environment to perform load balancing for e.g. energy/thermal management, fault tolerance and/or performance reasons

    Assessing the performance of energy-aware mappings

    Get PDF
    International audienceWe aim at mapping streaming applications that can be modeled by a series-parallel graph onto a 2-dimensional tiled chip multiprocessor (CMP) architecture. The objective of the mapping is to minimize the energy consumption, using dynamic voltage and frequency scaling (DVFS) techniques, while maintaining a given level of performance, reflected by the rate of processing the data streams. This mapping problem turns out to be NP-hard, and several heuristics are proposed. We assess their performance through comprehensive simulations using the StreamIt workflow suite and randomly generated series-parallel graphs, and various CMP grid sizes

    High performance constraint satisfaction problem solving: State-recomputation versus state-copying.

    Get PDF
    Constraint Satisfaction Problems (CSPs) in Artificial Intelligence have been an important focus of research and have been a useful model for various applications such as scheduling, image processing and machine vision. CSPs are mathematical problems that try to search values for variables according to constraints. There are many approaches for searching solutions of non-binary CSPs. Traditionally, most CSP methods rely on a single processor. With the increasing popularization of multiple processors, parallel search methods are becoming alternatives to speed up the search process. Parallel search is a subfield of artificial intelligence in which the constraint satisfaction problem is centralized whereas the search processes are distributed among the different processors. In this thesis we present a forward checking algorithm solving non-binary CSPs by distributing different branches to different processors via message passing interface and execute it on a high performance distributed system called SHARCNET. However, the problem is how to efficiently communicate the state of the search among processors. Two communication models, namely, state-recomputation and state-copying via message passing, are implemented and evaluated. This thesis investigates the behaviour of communication from one process to another. The experimental results demonstrate that the state-recomputation model with tighter constraints obtains a better performance than the state-copying model, but when constraints become looser, the state-copying model is a better choice.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Y364. Source: Masters Abstracts International, Volume: 44-01, page: 0417. Thesis (M.Sc.)--University of Windsor (Canada), 2005

    Computing Accurate Performance Bounds for Best Effort Networks-on-Chip

    Get PDF
    Real-time (RT) communication support is a critical requirement for many complex embedded applications which are currently targeted to Network-on-chip (NoC) platforms. In this paper, we present novel methods to efficiently calculate worst- case bandwidth and latency bounds for RT traffic streams on wormhole-switched NoCs with arbitrary topology. The proposed methods apply to best-effort NoC architectures, with no extra hardware dedicated to RT traffic support. By applying our methods to several realistic NoC designs, we show substantial improvements (more than 30% in bandwidth and 50% in latency, on average) in bound tightness with respect to existing approaches

    COSMIC: A Model for Multiprocessor Performance Analysis

    Get PDF
    COSMIC, the Combined Ordering Scheme Model with Isolated Components, describes the execution of specific algorithms on multiprocessors and facilitates analysis of their performance. Building upon previous modeling efforts such as Petri nets, COSMIC structures the modeling of a system along several issues including computational and overhead costs due to sequencing of operations, synchronization between operations, and contention for limited resources. This structuring allows us to isolate the performance impact associated with each issue. Finally, studying the performance of a system while executing a specific algorithm gives insight into its performance under realistic operating conditions. The model also allows us to study realistically sized algorithms with ease, especially when they are regularly structured. During the analysis of a system modeled by COSMIC, a set timed Petri nets is produced. These Petri nets are then analyzed to determine measures of the systems performance. To facilitate the specification, manipulation, and analysis of large timed Petri nets, a set of tools has been developed. These tools take advantage of several special properties of the timed Petri nets that greatly reduce the computational resources required to calculate the required measures. From this analysis, performance measures show not only total performance, but also present a breakdown of these results into several specific categories
    • …
    corecore