15,676 research outputs found
Recommended from our members
Fine-grain loop scheduling for MIMD machines
Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine
Analyzing Conflict Freedom For Multi-threaded Programs With Time Annotations
Avoiding access conflicts is a major challenge in the design of
multi-threaded programs. In the context of real-time systems, the absence of
conflicts can be guaranteed by ensuring that no two potentially conflicting
accesses are ever scheduled concurrently.In this paper, we analyze programs
that carry time annotations specifying the time for executing each statement.
We propose a technique for verifying that a multi-threaded program with time
annotations is free of access conflicts. In particular, we generate constraints
that reflect the possible schedules for executing the program and the required
properties. We then invoke an SMT solver in order to verify that no execution
gives rise to concurrent conflicting accesses. Otherwise, we obtain a trace
that exhibits the access conflict.Comment: http://journal.ub.tu-berlin.de/eceasst/article/view/97
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
A Low-Overhead Script Language for Tiny Networked Embedded Systems
With sensor networks starting to get mainstream acceptance, programmability is of increasing importance.
Customers and field engineers will need to reprogram existing deployments and software developers
will need to test and debug software in network testbeds. Script languages, which are a popular
mechanism for reprogramming in general-purpose computing, have not been considered for wireless sensor
networks because of the perceived overhead of interpreting a script language on tiny sensor nodes.
In this paper we show that a structured script language is both feasible and efficient for programming
tiny sensor nodes. We present a structured script language, SCript, and develop an interpreter for the
language. To reduce program distribution energy the SCript interpreter stores a tokenized representation
of the scripts which is distributed through the wireless network. The ROM and RAM footprint of the
interpreter is similar to that of existing virtual machines for sensor networks. We show that the interpretation
overhead of our language is on par with that of existing virtual machines. Thus script languages,
previously considered as too expensive for tiny sensor nodes, are a viable alternative to virtual machines
Introducing Molly: Distributed Memory Parallelization with LLVM
Programming for distributed memory machines has always been a tedious task,
but necessary because compilers have not been sufficiently able to optimize for
such machines themselves. Molly is an extension to the LLVM compiler toolchain
that is able to distribute and reorganize workload and data if the program is
organized in statically determined loop control-flows. These are represented as
polyhedral integer-point sets that allow program transformations applied on
them. Memory distribution and layout can be declared by the programmer as
needed and the necessary asynchronous MPI communication is generated
automatically. The primary motivation is to run Lattice QCD simulations on IBM
Blue Gene/Q supercomputers, but since the implementation is not yet completed,
this paper shows the capabilities on Conway's Game of Life
- …