1,259 research outputs found
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
Pixie: A heterogeneous Virtual Coarse-Grained Reconfigurable Array for high performance image processing applications
Coarse-Grained Reconfigurable Arrays (CGRAs) enable ease of programmability
and result in low development costs. They enable the ease of use specifically
in reconfigurable computing applications. The smaller cost of compilation and
reduced reconfiguration overhead enables them to become attractive platforms
for accelerating high-performance computing applications such as image
processing. The CGRAs are ASICs and therefore, expensive to produce. However,
Field Programmable Gate Arrays (FPGAs) are relatively cheaper for low volume
products but they are not so easily programmable. We combine best of both
worlds by implementing a Virtual Coarse-Grained Reconfigurable Array (VCGRA) on
FPGA. VCGRAs are a trade off between FPGA with large routing overheads and
ASICs. In this perspective we present a novel heterogeneous Virtual
Coarse-Grained Reconfigurable Array (VCGRA) called "Pixie" which is suitable
for implementing high performance image processing applications. The proposed
VCGRA contains generic processing elements and virtual channels that are
described using the Hardware Description Language VHDL. Both elements have been
optimized by using the parameterized configuration tool flow and result in a
resource reduction of 24% for each processing elements and 82% for each virtual
channels respectively.Comment: Presented at 3rd International Workshop on Overlay Architectures for
FPGAs (OLAF 2017) arXiv:1704.0880
Generic Connectivity-Based CGRA Mapping via Integer Linear Programming
Coarse-grained reconfigurable architectures (CGRAs) are programmable logic
devices with large coarse-grained ALU-like logic blocks, and multi-bit
datapath-style routing. CGRAs often have relatively restricted data routing
networks, so they attract CAD mapping tools that use exact methods, such as
Integer Linear Programming (ILP). However, tools that target general
architectures must use large constraint systems to fully describe an
architecture's flexibility, resulting in lengthy run-times. In this paper, we
propose to derive connectivity information from an otherwise generic device
model, and use this to create simpler ILPs, which we combine in an iterative
schedule and retain most of the exactness of a fully-generic ILP approach. This
new approach has a speed-up geometric mean of 5.88x when considering benchmarks
that do not hit a time-limit of 7.5 hours on the fully-generic ILP, and 37.6x
otherwise. This was measured using the set of benchmarks used to originally
evaluate the fully-generic approach and several more benchmarks representing
computation tasks, over three different CGRA architectures. All run-times of
the new approach are less than 20 minutes, with 90th percentile time of 410
seconds. The proposed mapping techniques are integrated into, and evaluated
using the open-source CGRA-ME architecture modelling and exploration framework.Comment: 8 pages of content; 8 figures; 3 tables; to appear in FCCM 2019; Uses
the CGRA-ME framework at http://cgra-me.ece.utoronto.ca
Are coarse-grained overlays ready for general purpose application acceleration on FPGAs?
Combining processors with hardware accelerators has become a norm with systems-on-chip (SoCs) ever present in modern compute devices. Heterogeneous programmable system on chip platforms sometimes referred to as hybrid FPGAs, tightly couple general purpose processors with high performance reconfigurable fabrics, providing a more flexible alternative. We can now think of a software application with hardware accelerated portions that are reconfigured at runtime. While such ideas have been explored in the past, modern hybrid FPGAs are the first commercial platforms to enable this move to a more software oriented view, where reconfiguration enables hardware resources to be shared by multiple tasks in a bigger application. However, while the rapidly increasing logic density and more capable hard resources found in modern hybrid FPGA devices should make them widely deployable, they remain constrained within specialist application domains. This is due to both design productivity issues and a lack of suitable hardware abstraction to eliminate the need for working with platform-specific details, as server and desktop virtualization has done in a more general sense. To allow mainstream adoption of FPGA based accelerators in general purpose computing, there is a need to virtualize FPGAs and make them more accessible to application developers who are accustomed to software API abstractions and fast development cycles. In this paper, we discuss the role of overlay architectures in enabling general purpose FPGA application acceleration
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
Rewriting History: Repurposing Domain-Specific CGRAs
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices
promising both the flexibility of FPGAs and the performance of ASICs. However,
with restricted domains comes a danger: designing chips that cannot accelerate
enough current and future software to justify the hardware cost. We introduce
FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to
operations they do not natively support.
FlexC uses dataflow rewriting, replacing unsupported regions of code with
equivalent operations that are supported by the CGRA. We use equality
saturation, a technique enabling efficient exploration of a large space of
rewrite rules, to effectively search through the program-space for supported
programs. We applied FlexC to over 2,000 loop kernels, compiling to four
different research CGRAs and 300 generated CGRAs and demonstrate a 2.2
increase in the number of loop kernels accelerated leading to 3 speedup
compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the
accelerator
- …