1,682 research outputs found
Lessons Learned from Designing the Montium - a Coarse-Grained Reconfigurable Processing Tile
In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wirelessdevices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project
The Chameleon project in retrospective
In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wireless devices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project
Compiler and Architecture Design for Coarse-Grained Programmable Accelerators
abstract: The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels.
At architectural level, one promising approach is to populate the system with hardware accelerators each optimized for a specific task. One drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function. Using software programmable accelerators is an alternative approach to achieve high energy-efficiency and programmability. Due to intrinsic characteristics of software accelerators, they can exploit both instruction level parallelism and data level parallelism.
Coarse-Grained Reconfigurable Architecture (CGRA) is a software programmable accelerator consists of a number of word-level functional units. Motivated by promising characteristics of software programmable accelerators, the potentials of CGRAs in future computing platforms is studied and an end-to-end CGRA research framework is developed. This framework consists of three different aspects: CGRA architectural design, integration in a computing system, and CGRA compiler. First, the design and implementation of a CGRA and its instruction set is presented. This design is then modeled in a cycle accurate system simulator. The simulation platform enables us to investigate several problems associated with a CGRA when it is deployed as an accelerator in a computing system. Next, the problem of mapping a compute intensive region of a program to CGRAs is formulated. From this formulation, several efficient algorithms are developed which effectively utilize CGRA scarce resources very well to minimize the running time of input applications. Finally, these mapping algorithms are integrated in a compiler framework to construct a compiler for CGRADissertation/ThesisDoctoral Dissertation Computer Science 201
The Chameleon Architecture for Streaming DSP Applications
We focus on architectures for streaming DSP applications such as wireless baseband processing and image processing. We aim at a single generic architecture that is capable of dealing with different DSP applications. This architecture has to be energy efficient and fault tolerant. We introduce a heterogeneous tiled architecture and present the details of a domain-specific reconfigurable tile processor called Montium. This reconfigurable processor has a small footprint (1.8 mm in a 130 nm process), is power efficient and exploits the locality of reference principle. Reconfiguring the device is very fast, for example, loading the coefficients for a 200 tap FIR filter is done within 80 clock cycles. The tiles on the tiled architecture are connected to a Network-on-Chip (NoC) via a network interface (NI). Two NoCs have been developed: a packet-switched and a circuit-switched version. Both provide two types of services: guaranteed throughput (GT) and best effort (BE). For both NoCs estimates of power consumption are presented. The NI synchronizes data transfers, configures and starts/stops the tile processor. For dynamically mapping applications onto the tiled architecture, we introduce a run-time mapping tool
Rewriting History: Repurposing Domain-Specific CGRAs
Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices
promising both the flexibility of FPGAs and the performance of ASICs. However,
with restricted domains comes a danger: designing chips that cannot accelerate
enough current and future software to justify the hardware cost. We introduce
FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to
operations they do not natively support.
FlexC uses dataflow rewriting, replacing unsupported regions of code with
equivalent operations that are supported by the CGRA. We use equality
saturation, a technique enabling efficient exploration of a large space of
rewrite rules, to effectively search through the program-space for supported
programs. We applied FlexC to over 2,000 loop kernels, compiling to four
different research CGRAs and 300 generated CGRAs and demonstrate a 2.2
increase in the number of loop kernels accelerated leading to 3 speedup
compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the
accelerator
- …