Search CORE

15,676 research outputs found

Recommended from our members

Fine-grain loop scheduling for MIMD machines

Author: Brownhill Carrie J.
Kim Ki-chang
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 02/10/1990
Field of study

Previous algorithms for parallelizing loops on MIMD machines have been based on assigning one or more loop iterations to each processor, introducing synchronization as required. These methods exploit only iteration level parallelism, and ignore the parallelism that may exist at a lower level.In order to exploit parallelism both within and across iterations, our algorithm analyzes and schedules the loop at the statement level. The loop schedule reflects the expected communication and synchronization costs of the target machine. We provide test results that show that this algorithm can produce good speedup of loops on an MIMD machine

eScholarship - University of California

The Fortran parallel transformer and its programming environment

Author: D'Hollander Erik
WANG Q
ZHANG FB
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

Analyzing Conflict Freedom For Multi-threaded Programs With Time Annotations

Author: Gabriele Taentzer
Jaco Van De Pol
Jingshu Chen
Jingshu Chen
Julia Padberg
Marie Duflot
Marie Duflot
Marieke Huisman
Stephan Merz
Stephan Merz
Tiziana Margaria
Publication venue
Publication date: 18/11/2014
Field of study

Avoiding access conflicts is a major challenge in the design of multi-threaded programs. In the context of real-time systems, the absence of conflicts can be guaranteed by ensuring that no two potentially conflicting accesses are ever scheduled concurrently.In this paper, we analyze programs that carry time annotations specifying the time for executing each statement. We propose a technique for verifying that a multi-threaded program with time annotations is free of access conflicts. In particular, we generate constraints that reflect the possible schedules for executing the program and the required properties. We then invoke an SMT solver in order to verify that no execution gives rise to concurrent conflicting accesses. Otherwise, we obtain a trace that exhibits the access conflict.Comment: http://journal.ub.tu-berlin.de/eceasst/article/view/97

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Scipedia

A Low-Overhead Script Language for Tiny Networked Embedded Systems

Author: Dunkels Adam
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2006
Field of study

With sensor networks starting to get mainstream acceptance, programmability is of increasing importance. Customers and field engineers will need to reprogram existing deployments and software developers will need to test and debug software in network testbeds. Script languages, which are a popular mechanism for reprogramming in general-purpose computing, have not been considered for wireless sensor networks because of the perceived overhead of interpreting a script language on tiny sensor nodes. In this paper we show that a structured script language is both feasible and efficient for programming tiny sensor nodes. We present a structured script language, SCript, and develop an interpreter for the language. To reduce program distribution energy the SCript interpreter stores a tokenized representation of the scripts which is distributed through the wireless network. The ROM and RAM footprint of the interpreter is similar to that of existing virtual machines for sensor networks. We show that the interpretation overhead of our language is on par with that of existing virtual machines. Thus script languages, previously considered as too expensive for tiny sensor nodes, are a viable alternative to virtual machines

CiteSeerX

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Introducing Molly: Distributed Memory Parallelization with LLVM

Author: Kruse Michael
Publication venue
Publication date: 01/01/2013
Field of study

Programming for distributed memory machines has always been a tedious task, but necessary because compilers have not been sufficiently able to optimize for such machines themselves. Molly is an extension to the LLVM compiler toolchain that is able to distribute and reorganize workload and data if the program is organized in statically determined loop control-flows. These are represented as polyhedral integer-point sets that allow program transformations applied on them. Memory distribution and layout can be declared by the programmer as needed and the necessary asynchronous MPI communication is generated automatically. The primary motivation is to run Lattice QCD simulations on IBM Blue Gene/Q supercomputers, but since the implementation is not yet completed, this paper shows the capabilities on Conway's Game of Life

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1