11,359 research outputs found
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile
An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism
Verification and Synthesis of Symmetric Uni-Rings for Leads-To Properties
This paper investigates the verification and synthesis of parameterized
protocols that satisfy leadsto properties on symmetric
unidirectional rings (a.k.a. uni-rings) of deterministic and constant-space
processes under no fairness and interleaving semantics, where and are
global state predicates. First, we show that verifying for
parameterized protocols on symmetric uni-rings is undecidable, even for
deterministic and constant-space processes, and conjunctive state predicates.
Then, we show that surprisingly synthesizing symmetric uni-ring protocols that
satisfy is actually decidable. We identify necessary and
sufficient conditions for the decidability of synthesis based on which we
devise a sound and complete polynomial-time algorithm that takes the predicates
and , and automatically generates a parameterized protocol that
satisfies for unbounded (but finite) ring sizes. Moreover, we
present some decidability results for cases where leadsto is required from
multiple distinct predicates to different predicates. To demonstrate
the practicality of our synthesis method, we synthesize some parameterized
protocols, including agreement and parity protocols
Mapping Applications to an FPFA Tile
This paper introduces a transformational design method which can be used to map code written in a high level source language, like C, to a coarse grain reconfigurable architecture. The source code is first translated into a control data flow graph (CDFG), which is minimized using a set of behaviour preserving transformations, such as dependency analysis, common subexpression elimination, etc. After applying graph clustering, scheduling and allocation transformations on this minimized graph, it can be mapped onto the target architecture
- …