639 research outputs found

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    A Reconfigurable Tile-Based Architecture to Compute FFT and FIR Functions in the Context of Software-Defined Radio

    Get PDF
    Software-defined radio (SDR) is the term used for flexible radio systems that can deal with multiple standards. For an efficient implementation, such systems require appropriate reconfigurable architectures. This paper targets the efficient implementation of the most computationally intensive kernels of two significantly different standards, viz. Bluetooth and HiperLAN/2, on the same reconfigurable hardware. These kernels are FIR filtering and FFT. The designed architecture is based on a two-dimensional arrangement of 17 tiles. Each tile contains a multiplier, an adder, local memory and multiplexers allowing flexible communication with the neighboring tiles. The tile-base data path is complemented with a global controller and various memories. The design has been implemented in SystemC and simulated extensively to prove equivalence with a reference all-software design. It has also been synthesized and turns out to outperform significantly other reconfigurable designs with respect to speed and area

    Are coarse-grained overlays ready for general purpose application acceleration on FPGAs?

    Get PDF
    Combining processors with hardware accelerators has become a norm with systems-on-chip (SoCs) ever present in modern compute devices. Heterogeneous programmable system on chip platforms sometimes referred to as hybrid FPGAs, tightly couple general purpose processors with high performance reconfigurable fabrics, providing a more flexible alternative. We can now think of a software application with hardware accelerated portions that are reconfigured at runtime. While such ideas have been explored in the past, modern hybrid FPGAs are the first commercial platforms to enable this move to a more software oriented view, where reconfiguration enables hardware resources to be shared by multiple tasks in a bigger application. However, while the rapidly increasing logic density and more capable hard resources found in modern hybrid FPGA devices should make them widely deployable, they remain constrained within specialist application domains. This is due to both design productivity issues and a lack of suitable hardware abstraction to eliminate the need for working with platform-specific details, as server and desktop virtualization has done in a more general sense. To allow mainstream adoption of FPGA based accelerators in general purpose computing, there is a need to virtualize FPGAs and make them more accessible to application developers who are accustomed to software API abstractions and fast development cycles. In this paper, we discuss the role of overlay architectures in enabling general purpose FPGA application acceleration

    Automatic synthesis of reconfigurable instruction set accelerators

    Get PDF

    A polymorphic hardware platform

    Get PDF
    In the domain of spatial computing, it appears that platforms based on either reconfigurable datapath units or on hybrid microprocessor/logic cell organizations are in the ascendancy as they appear to offer the most efficient means of providing resources across the greatest range of hardware designs. This paper encompasses an initial exploration of an alternative organization. It looks at the effect of using a very fine-grained approach based on a largely undifferentiated logic cell that can be configured to operate as a state element, logic or interconnect - or combinations of all three. A vertical layout style hides the overheads imposed by reconfigurability to an extent where very fine-grained organizations become a viable option. It is demonstrated that the technique can be used to develop building blocks for both synchronous and asynchronous circuits, supporting the development of hybrid architectures such as globally asynchronous, locally synchronous

    Memory hierarchy and data communication in heterogeneous reconfigurable SoCs

    Get PDF
    The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented
    corecore