2,323 research outputs found

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    VLSI signal processing through bit-serial architectures and silicon compilation

    Get PDF

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Serial-data computation in VLSI

    Get PDF

    Simulated annealing based datapath synthesis

    Get PDF

    Extending systems-on-chip to the third dimension : performance, cost and technological tradeoffs.

    Get PDF
    Because of the today's market demand for high-performance, high-density portable hand-held applications, electronic system design technology has shifted the focus from 2-D planar SoC single-chip solutions to different alternative options as tiled silicon and single-level embedded modules as well as 3-D integration. Among the various choices, finding an optimal solution for system implementation dealt usually with cost, performance and other technological trade-off analysis at the system conceptual level. It has been identified that the decisions made within the first 20% of the total design cycle time will ultimately result up to 80% of the final product cost. In this paper, we discuss appropriate and realistic metric for performance and cost trade-off analysis both at system conceptual level (up-front in the design phase) and at implementation phase for verification in the three-dimensional integration. In order to validate the methodology, two ubiquitous electronic systems are analyzed under various implementation schemes and discuss the pros and cons of each of them

    High Level Synthesis of Neural Network Chips

    Get PDF
    This thesis investigates the development of a silicon compiler dedicated to generate Application-Specific Neural Network Chips (ASNNCs) from a high level C-based behavioural specification language. The aim is to fully integrate the silicon compiler with the ESPRIT II Pygmalion neural programming environment. The integration of these two tools permits the translation of a neural network application specified in nC, the Pygmalion's C-based neural programming language, into either binary (for simulation) or silicon (for execution in hardware). Several applications benefit from this approach, in particular the ones that require real-time execution, for which a true neural computer is required. This research comprises two major parts: extension of the Pygmalion neural programming environment, to support automatic generation of neural network chips from the nC specification language; and implementation of the high level synthesis part of the neural silicon compiler. The extension of the neural programming environment has been developed to adapt the nC language to hardware constraints, and to provide the environment with a simulation tool to test in advance the performance of the neural chips. Firstly, new hardware-specific requisites have been incorporated to nC. However, special attention has been taken to avoid transforming nC into a hardware-oriented language, since the system assumes minimum (or even no) knowledge of VLSI design from the application developer. Secondly, a simulator for neural network hardware has been developed, which assesses how well the generated circuit will perform the neural computation. Lastly, a hardware library of neural network models associated with a target VLSI architecture has been built. The development of the neural silicon compiler focuses on the high level synthesis part of the process. The goal of the silicon compiler is to take nC as the input language and automatically translate it into one or more identical integrated circuits, which are specified in VHDL (the IEEE standard hardware description language) at the register transfer level. The development of the high level synthesis comprises four major parts: firstly, compilation and software-like optimisations of nC; secondly, transformation of the compiled code into a graph-based internal representation, which has been designed to be the basis for the hardware synthesis; thirdly, further transformations and hardware-like optimisations on the internal representation; and finally, creation of the neural chip's data path and control unit that implement the behaviour specified in nC. Special attention has been devoted to the creation of optimised hardware structures for the ASNNCs employing both phases of neural computing on-chip: recall and learning. This is achieved through the data path and control synthesis algorithms, which adopt a heuristic approach that targets the generated hardware structure of the neural chip in a specific VLSI architecture, namely the Generic Neuron. The viability, concerning the effective use of silicon area versus speed, has been evaluated through the automatic generation of a VHDL description for the neural chip employing the Back Propagation neural network model. This description is compared with the one created manually by a hardware designer
    corecore