1,317 research outputs found

    High-level synthesis of VLSI circuits

    Get PDF

    High-Level Synthesis for Embedded Systems

    Get PDF

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    FPGA Implementation of Data Flow Graphs for Digital Signal Processing Applications

    Get PDF
    A rapid growth in digital signal processing applications has increased the requirement for high-speed digital systems. Multiprocessor systems are the best choice for these applications. A prior sequence of operations should be applied to the operations that described the nature of these applications before hardware implementation is produced. These operations should be scheduled and hardware allocated. This paper proposes a new scheduling technique for digital signal processing (DSP) applications has been represented by data flow graphs (DFGs). In addition, hardware allocation is implemented in the form of embedded system. A proposed scheduling technique also achieves the optimal scheduling of a DFG at design time. The optimality criteria considered in this algorithm are the maximum throughput within the available hardware resources. The maximum throughput is achieved by arranging the DFG nodes according to their inter-related data dependencies. Then, two nodes can be clustered into one compound task to reduce the overall execution time by minimizing the number of tasks to be executed that minimizing the number of cycles to execute them. Then each task is presented in form of instruction to be executed in the hardware system. A hardware system is composed of one or multiple homogenous pipelined processing elements and it is designed to meet the maximum-rate schedule.  Two implementations are proposed of the system architecture according to the number of the processing elements, namely:  the serial system and the parallel system. The serial system comprises one processing element where all tasks are processed sequentially, whilst the parallel system has four processing elements to execute tasks concurrently. These systems consist mainly of seven units: central shared memory, state table, multiway function unit buffer, execution array, processing element/s, instruction buffer and the address generation unit. The hardware components were built on an FPGA chip using Verilog HDL. In synthesis results, the parallel system has better system performance by 25.5% than the serial system. While the serial system requires smaller area size, which described by the number of slice registers and the number of the slice lookup tables (LUTs) than the parallel one. The relationship between the number of instructions that are executed in both systems, and the system area and the system performance that presented by system frequency, are studied. By increasing memories size in both systems, the system performance isn’t affected as in a serial system, and it is slightly decreased as the parallel system by 1.5% to 4.5%. In terms of the systems area, both serial system area and parallel system area are increased and in some cases are doubled. The proposed scheduling technique is shown to outperform the retaining technique, which we have chosen to compare with.  The serial system has better performance by 19.3% higher system frequency than a retiming technique. And the parallel system also outperforms the retaining technique by 51.2% higher system frequency in synthesis results

    The hArtes Tool Chain

    Get PDF
    This chapter describes the different design steps needed to go from legacy code to a transformed application that can be efficiently mapped on the hArtes platform

    Multiple objective optimisation of data and control paths in a behavioural silicon compiler

    No full text
    The objective of this research was to implement an 'intelligent' silicon compiler that provides the ability to automatically explore the design space and optimise a design, given as a behavioural description, with respect to multiple objectives. The objective has been met by the implementation of the MOODS Silicon Compiler. The user submits goals or objectives to the system which automatically finds near optimal solutions. As objectives may be conflicting, trade-offs between synthesis tasks are essential and consequently their simultaneous execution must occur. Tasks are decomposed into behaviour preserving transformations which, due to their completeness, can be applied in any sequence to a multi-level representation of the design. An accurate evaluation of the design is ensured by feeding up technology dependent information to a cost function. The cost function guides the simulated annealing algorithm in applying transformations to iteratively optimise the design. The simulated annealing algorithm provides an abstractness from the transformations and designer's objectives. This abstractness avoids the construction of tailored heuristics which pre-program trade-offs into a system. Pre-programmed trade-offs are used in most systems by assuming a particular shape to the trade-off curve and are inappropriate as trade-offs are technology dependent. The lack of pre-programmed trade-offs in the MOODS system allows it to adapt to changes in technology or library cells. The choice of cells and their subsequent sharing are based on the user's criteria expressed in the cost function, rather than being pre-programmed into the system. The results show that implementations created by MOODS are better than or equal to those achieved by other systems. Comparisons with other systems highlighted the importance of specifying all of a design's data as the lack of data misrepresents the design leading to misleading comparisons. The MOODS synthesis system includes an efficient method for automated design space exploration where a varied set of near optimal implementations can be produced from a single behavioural specification. Design space exploration is an important aspect of designing by high-level synthesis and in the development of synthesis systems. It allows the designer to obtain a perspicuous characterization of a design's design space allowing him to investigate alternative designs
    corecore