3 research outputs found
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology
Apart from academic, recently more and more commercial coarse-grained reconfigurable arrays have been developed. Computational intensive applications from the area of video and wireless communication seek to exploit the computational power of such massively parallel SoCs. Conventionally, DSP processors are used in the digital signal processing domain. Thus, the existing compilation techniques are closely related to approaches from the DSP world. These approaches employ several loop transformations, like pipelining or temporal partitioning, but they are not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In this paper, (i) we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable arrays, (ii) we present our design methodology for mapping regular algorithms onto massively parallel arrays which is characterized by loop parallelization in the polytope model, and (iii), in a first case study, we adapt our design methodology for targeting reconfigurable arrays. The case study shows that the presented regular mapping methodology may lead to highly efficient implementations taking into account the constraints of the architecture.
DESIGNING COST-EFFECTIVE COARSE-GRAINED RECONFIGURABLE ARCHITECTURE
Application-specific optimization of embedded systems becomes inevitable to satisfy the
market demand for designers to meet tighter constraints on cost, performance and power.
On the other hand, the flexibility of a system is also important to accommodate the short
time-to-market requirements for embedded systems. To compromise these incompatible
demands, coarse-grained reconfigurable architecture (CGRA) has emerged as a suitable
solution. A typical CGRA requires many processing elements (PEs) and a configuration
cache for reconfiguration of its PE array. However, such a structure consumes significant
area and power. Therefore, designing cost-effective CGRA has been a serious concern
for reliability of CGRA-based embedded systems.
As an effort to provide such cost-effective design, the first half of this work
focuses on reducing power in the configuration cache. For power saving in the configuration
cache, a low power reconfiguration technique is presented based on reusable context
pipelining achieved by merging the concept of context reuse into context pipelining.
In addition, we propose dynamic context compression capable of supporting only required
bits of the context words set to enable and the redundant bits set to disable. Finally, we provide dynamic context management capable of reducing reduce power consumption
in configuration cache by controlling a read/write operation of the redundant
context words
In the second part of this dissertation, we focus on designing a cost-effective PE array
to reduce area and power. For area and power saving in a PE array, we devise a costeffective
array fabric addresses novel rearrangement of processing elements and their
interconnection designs to reduce area and power consumption. In addition, hierarchical
reconfigurable computing arrays are proposed consisting of two reconfigurable computing
blocks with two types of communication structure together. The two computing
blocks have shared critical resources and such a sharing structure provides efficient
communication interface between them with reducing overall area.
Based on the proposed design approaches, a CGRA combining the multiple design
schemes is shown to verify the synergy effect of the integrated approach. Experimental
results show that the integrated approach reduces area by 23.07% and power by up to
72% when compared with the conventional CGRA