1,382 research outputs found
Type-driven automated program transformations and cost modelling for optimising streaming programs on FPGAs
In this paper we present a novel approach to program optimisation based on compiler-based type-driven program transformations and a fast and accurate cost/performance model for the target architecture. We target streaming programs for the problem domain of scientific computing, such as numerical weather prediction. We present our theoretical framework for type-driven program transformation, our target high-level language and intermediate representation languages and the cost model and demonstrate the effectiveness of our approach by comparison with a commercial toolchain
Building an Application-specific Memory Hierarchy on FPGA
The high potential performance of FPGAs cannot be exploited if a design suffers a memory bottleneck. Therefore, a memory hierarchy is needed to reuse data in on-chip memories and minimize the number of accesses to off-chip memory
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the design’s representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
- …