3 research outputs found

    Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology

    No full text
    Apart from academic, recently more and more commercial coarse-grained reconfigurable arrays have been developed. Computational intensive applications from the area of video and wireless communication seek to exploit the computational power of such massively parallel SoCs. Conventionally, DSP processors are used in the digital signal processing domain. Thus, the existing compilation techniques are closely related to approaches from the DSP world. These approaches employ several loop transformations, like pipelining or temporal partitioning, but they are not able to exploit the full parallelism of a given algorithm and the computational potential of a typical 2-dimensional array. In this paper, (i) we present an overview of constraints which have to be considered when mapping applications to coarse-grained reconfigurable arrays, (ii) we present our design methodology for mapping regular algorithms onto massively parallel arrays which is characterized by loop parallelization in the polytope model, and (iii), in a first case study, we adapt our design methodology for targeting reconfigurable arrays. The case study shows that the presented regular mapping methodology may lead to highly efficient implementations taking into account the constraints of the architecture.

    DESIGNING COST-EFFECTIVE COARSE-GRAINED RECONFIGURABLE ARCHITECTURE

    Get PDF
    Application-specific optimization of embedded systems becomes inevitable to satisfy the market demand for designers to meet tighter constraints on cost, performance and power. On the other hand, the flexibility of a system is also important to accommodate the short time-to-market requirements for embedded systems. To compromise these incompatible demands, coarse-grained reconfigurable architecture (CGRA) has emerged as a suitable solution. A typical CGRA requires many processing elements (PEs) and a configuration cache for reconfiguration of its PE array. However, such a structure consumes significant area and power. Therefore, designing cost-effective CGRA has been a serious concern for reliability of CGRA-based embedded systems. As an effort to provide such cost-effective design, the first half of this work focuses on reducing power in the configuration cache. For power saving in the configuration cache, a low power reconfiguration technique is presented based on reusable context pipelining achieved by merging the concept of context reuse into context pipelining. In addition, we propose dynamic context compression capable of supporting only required bits of the context words set to enable and the redundant bits set to disable. Finally, we provide dynamic context management capable of reducing reduce power consumption in configuration cache by controlling a read/write operation of the redundant context words In the second part of this dissertation, we focus on designing a cost-effective PE array to reduce area and power. For area and power saving in a PE array, we devise a costeffective array fabric addresses novel rearrangement of processing elements and their interconnection designs to reduce area and power consumption. In addition, hierarchical reconfigurable computing arrays are proposed consisting of two reconfigurable computing blocks with two types of communication structure together. The two computing blocks have shared critical resources and such a sharing structure provides efficient communication interface between them with reducing overall area. Based on the proposed design approaches, a CGRA combining the multiple design schemes is shown to verify the synergy effect of the integrated approach. Experimental results show that the integrated approach reduces area by 23.07% and power by up to 72% when compared with the conventional CGRA
    corecore