11,359 research outputs found

    Improving the scalability of parallel N-body applications with an event driven constraint based execution model

    Full text link
    The scalability and efficiency of graph applications are significantly constrained by conventional systems and their supporting programming models. Technology trends like multicore, manycore, and heterogeneous system architectures are introducing further challenges and possibilities for emerging application domains such as graph applications. This paper explores the space of effective parallel execution of ephemeral graphs that are dynamically generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The workloads are expressed using the semantics of an Exascale computing execution model called ParalleX. For comparison, results using conventional execution model semantics are also presented. We find improved load balancing during runtime and automatic parallelism discovery improving efficiency using the advanced semantics for Exascale computing.Comment: 11 figure

    Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile

    Get PDF
    An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism

    Verification and Synthesis of Symmetric Uni-Rings for Leads-To Properties

    Full text link
    This paper investigates the verification and synthesis of parameterized protocols that satisfy leadsto properties RQR \leadsto Q on symmetric unidirectional rings (a.k.a. uni-rings) of deterministic and constant-space processes under no fairness and interleaving semantics, where RR and QQ are global state predicates. First, we show that verifying RQR \leadsto Q for parameterized protocols on symmetric uni-rings is undecidable, even for deterministic and constant-space processes, and conjunctive state predicates. Then, we show that surprisingly synthesizing symmetric uni-ring protocols that satisfy RQR \leadsto Q is actually decidable. We identify necessary and sufficient conditions for the decidability of synthesis based on which we devise a sound and complete polynomial-time algorithm that takes the predicates RR and QQ, and automatically generates a parameterized protocol that satisfies RQR \leadsto Q for unbounded (but finite) ring sizes. Moreover, we present some decidability results for cases where leadsto is required from multiple distinct RR predicates to different QQ predicates. To demonstrate the practicality of our synthesis method, we synthesize some parameterized protocols, including agreement and parity protocols

    Mapping Applications to an FPFA Tile

    Get PDF
    This paper introduces a transformational design method which can be used to map code written in a high level source language, like C, to a coarse grain reconfigurable architecture. The source code is first translated into a control data flow graph (CDFG), which is minimized using a set of behaviour preserving transformations, such as dependency analysis, common subexpression elimination, etc. After applying graph clustering, scheduling and allocation transformations on this minimized graph, it can be mapped onto the target architecture
    corecore