91,052 research outputs found
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Platform independent profiling of a QCD code
The supercomputing platforms available for high performance computing based
research evolve at a great rate. However, this rapid development of novel
technologies requires constant adaptations and optimizations of the existing
codes for each new machine architecture. In such context, minimizing time of
efficiently porting the code on a new platform is of crucial importance. A
possible solution for this common challenge is to use simulations of the
application that can assist in detecting performance bottlenecks. Due to
prohibitive costs of classical cycle-accurate simulators, coarse-grain
simulations are more suitable for large parallel and distributed systems. We
present a procedure of implementing the profiling for openQCD code [1] through
simulation, which will enable the global reduction of the cost of profiling and
optimizing this code commonly used in the lattice QCD community. Our approach
is based on well-known SimGrid simulator [2], which allows for fast and
accurate performance predictions of HPC codes. Additionally, accurate
estimations of the program behavior on some future machines, not yet accessible
to us, are anticipated
A Platform-independent Programming Environment for Robot Control
The development of robot control programs is a complex task. Many robots are
different in their electrical and mechanical structure which is also reflected
in the software. Specific robot software environments support the program
development, but are mainly text-based and usually applied by experts in the
field with profound knowledge of the target robot. This paper presents a
graphical programming environment which aims to ease the development of robot
control programs. In contrast to existing graphical robot programming
environments, our approach focuses on the composition of parallel action
sequences. The developed environment allows to schedule independent robot
actions on parallel execution lines and provides mechanism to avoid
side-effects of parallel actions. The developed environment is
platform-independent and based on the model-driven paradigm. The feasibility of
our approach is shown by the application of the sequencer to a simulated
service robot and a robot for educational purpose
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
Decomposition tool for Event-B
Two methods have been identified for Event-B model decomposition: shared variable and shared event. The purpose of this paper is to introduce the two approaches and the respective tool support in the Rodin platform. Besides alleviating the complexity for large systems and respective proofs, decomposition allows team development in parallel over the same model which is very attractive in the industrial environment
- …