15,676 research outputs found

    The Fortran parallel transformer and its programming environment

    Get PDF

    Analyzing Conflict Freedom For Multi-threaded Programs With Time Annotations

    Get PDF
    Avoiding access conflicts is a major challenge in the design of multi-threaded programs. In the context of real-time systems, the absence of conflicts can be guaranteed by ensuring that no two potentially conflicting accesses are ever scheduled concurrently.In this paper, we analyze programs that carry time annotations specifying the time for executing each statement. We propose a technique for verifying that a multi-threaded program with time annotations is free of access conflicts. In particular, we generate constraints that reflect the possible schedules for executing the program and the required properties. We then invoke an SMT solver in order to verify that no execution gives rise to concurrent conflicting accesses. Otherwise, we obtain a trace that exhibits the access conflict.Comment: http://journal.ub.tu-berlin.de/eceasst/article/view/97

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    A Low-Overhead Script Language for Tiny Networked Embedded Systems

    Get PDF
    With sensor networks starting to get mainstream acceptance, programmability is of increasing importance. Customers and field engineers will need to reprogram existing deployments and software developers will need to test and debug software in network testbeds. Script languages, which are a popular mechanism for reprogramming in general-purpose computing, have not been considered for wireless sensor networks because of the perceived overhead of interpreting a script language on tiny sensor nodes. In this paper we show that a structured script language is both feasible and efficient for programming tiny sensor nodes. We present a structured script language, SCript, and develop an interpreter for the language. To reduce program distribution energy the SCript interpreter stores a tokenized representation of the scripts which is distributed through the wireless network. The ROM and RAM footprint of the interpreter is similar to that of existing virtual machines for sensor networks. We show that the interpretation overhead of our language is on par with that of existing virtual machines. Thus script languages, previously considered as too expensive for tiny sensor nodes, are a viable alternative to virtual machines

    Introducing Molly: Distributed Memory Parallelization with LLVM

    Get PDF
    Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life
    corecore