40 research outputs found

    Energy-Aware Exploration of Coarse Grained Reconfigurable Processors

    No full text
    In recent years Coarse Grained Reconfigurable Architectures (CGRAs) have emerged as a viable option in embedded systems. In this paper we present a power breakdown analysis of such an CGRA. We also present an energy aware exploration for one of the most important, but often neglected parts of processor architectures: the interconnect. The results show that the choice of different interconnection topologies has a significant influence on resulting quality metrics, like performance and energy efficiency. KEYWORDS: Energy-Aware design, Low Power, Processor Architecture, Interconnect aware

    A Customized Cross-Bar for Data-Shuffling in Domain-Specific SIMD Processors

    No full text
    Abstract. Shuffle operations are one of the most common operations in SIMD based embedded system architectures. In this paper we study different families of shuffle operations that frequently occur in embedded applications running on SIMD architectures. These shuffle operations are used to drive the design of a custom shuffler for domain-specific SIMD processors. The energy efficiency of various crossbar based custom shufflers is analyzed and compared with the widely used full crossbar. We show that by customizing the crossbar to implement specific shuffle operations required in the target application domain, we can reduce the energy consumption of shuffle operations by up to 80%. We also illustrate the tradeoffs between flexibility and energy efficiency of custom shufflers and show that customization offers reasonable benefits without compromising the flexibility required for the target application domain.

    Crisp: A template for reconfigurable instruction set processors

    No full text
    Abstract. A template for reconfigurable instruction set processors is described. This template defines a design space that enables the exploration of processors potentially suitable for flexible, power and cost efficient implementations of embedded multimedia applications, such as video compression in a hand held device. The template is based on a VLIW processor with a reconfigurable instruction set. In the future this template will be used for design space exploration, compiler retargeting and automatic hardware synthesis. Several existing reconfigurable- and non-reconfigurable processors were mapped onto the template to assess its expressiveness.

    Software pipelining for coarse-grained reconfigurable instruction set processors

    No full text
    This paper shows that software pipelining can be an effective technique for code generation for coarse-grained reconfigurable instruction set processors. The paper describes a technique, based on adding an operation assignment phase to software pipelining, that performs reconfigurable instruction generation and instruction scheduling on a combined algorithm. Although typical compilers for reconfigurable processors perform these steps separately, results show that the combination enables a successful usage of the reconfigurable resources. The assignment algorithm is the key for using software pipelining on the reconfigurable processor. The technique presented is also able to exploit spatial computation inside the reconfigurable functional unit by which the output of a processing element is directly connected to the input of another processing element without the need of an intermediate register. Results show that it is possible to reduce the cycle count by using this spatial computation.

    L0 buffer energy optimization through scheduling and exploration

    No full text
    Clustered L0 buffers are an interesting alternative to reduce energy consumption in the instruction memory hierarchy of embedded VLIW processors. Currently, the synthesis of L0 clusters is performed as a hardware optimization, where the compiler generates a schedule and based on the given schedule L0 clusters are generated. However, the cluster synthesis is sensitive to the given schedule. This offers an interesting design space to explore the effects on clustering by altering the schedule to increase energy efficiency. In this paper we present a preliminary study indicating the potentials offered by scheduling for L0 clusters in terms of L0 buffer energy reduction. A list scheduler is extended to recognize the L0 clusters and based on a few simple heuristics operations are assigned to clusters. An iterative methodology is employed to explore the design space. The simulation results indicate that up to 10 % of L0 buffer energy can potentially be reduced by scheduling for L0 clusters with a simple heuristic. 1
    corecore