646 research outputs found
Rewriting History: Repurposing Domain-Specific CGRAs
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices
promising both the flexibility of FPGAs and the performance of ASICs. However,
with restricted domains comes a danger: designing chips that cannot accelerate
enough current and future software to justify the hardware cost. We introduce
FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to
operations they do not natively support.
FlexC uses dataflow rewriting, replacing unsupported regions of code with
equivalent operations that are supported by the CGRA. We use equality
saturation, a technique enabling efficient exploration of a large space of
rewrite rules, to effectively search through the program-space for supported
programs. We applied FlexC to over 2,000 loop kernels, compiling to four
different research CGRAs and 300 generated CGRAs and demonstrate a 2.2
increase in the number of loop kernels accelerated leading to 3 speedup
compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the
accelerator
Compiling Geometric Algebra Computations into Reconfigurable Hardware Accelerators
Geometric Algebra (GA), a generalization of quaternions and complex numbers, is a very
powerful framework for intuitively expressing and manipulating the complex
geometric relationships common to engineering problems.
However, actual processing of GA expressions is very compute intensive, and
acceleration is generally required for practical use. GPUs and FPGAs offer
such acceleration, while requiring only low-power per operation.
In this paper, we present key components of a proof-of-concept compile flow
combining symbolic and hardware optimization techniques to
automatically generate hardware accelerators from the abstract GA descriptions that are suitable for high-performance embedded computing
The PARSE Programming Paradigm. Part I: Software Development Methodology. Part II: Software Development Support Tools
The programming methodology of PARSE (parallel software environment), a software environment being developed for reconfigurable non-shared memory parallel computers, is described. This environment will consist of an integrated collection of language interfaces, automatic and semi-automatic debugging and analysis tools, and operating system —all of which are made more flexible by the use of a knowledge-based implementation for the tools that make up PARSE. The programming paradigm supports the user freely choosing among three basic approaches /abstractions for programming a parallel machine: logic-based descriptive, sequential-control procedural, and parallel-control procedural programming. All of these result in efficient parallel execution. The current work discusses the methodology underlying PARSE, whereas the companion paper, “The PARSE Programming Paradigm — II: Software Development Support Tools,” details each of the component tools
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
FOS: A Modular FPGA Operating System for Dynamic Workloads
With FPGAs now being deployed in the cloud and at the edge, there is a need
for scalable design methods which can incorporate the heterogeneity present in
the hardware and software components of FPGA systems. Moreover, these FPGA
systems need to be maintainable and adaptable to changing workloads while
improving accessibility for the application developers. However, current FPGA
systems fail to achieve modularity and support for multi-tenancy due to
dependencies between system components and lack of standardised abstraction
layers. To solve this, we introduce a modular FPGA operating system -- FOS,
which adopts a modular FPGA development flow to allow each system component to
be changed and be agnostic to the heterogeneity of EDA tool versions, hardware
and software layers. Further, to dynamically maximise the utilisation
transparently from the users, FOS employs resource-elastic scheduling to
arbitrate the FPGA resources in both time and spatial domain for any type of
accelerators. Our evaluation on different FPGA boards shows that FOS can
provide performance improvements in both single-tenant and multi-tenant
environments while substantially reducing the development time and, at the same
time, improving flexibility
- …