53 research outputs found

    Asynchronous techniques for system-on-chip design

    Get PDF
    SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Asynchronous Techniques for System-on-Chip Design

    Full text link

    Design of an asynchronous processor

    Get PDF

    High Level Synthesis of Neural Network Chips

    Get PDF
    This thesis investigates the development of a silicon compiler dedicated to generate Application-Specific Neural Network Chips (ASNNCs) from a high level C-based behavioural specification language. The aim is to fully integrate the silicon compiler with the ESPRIT II Pygmalion neural programming environment. The integration of these two tools permits the translation of a neural network application specified in nC, the Pygmalion's C-based neural programming language, into either binary (for simulation) or silicon (for execution in hardware). Several applications benefit from this approach, in particular the ones that require real-time execution, for which a true neural computer is required. This research comprises two major parts: extension of the Pygmalion neural programming environment, to support automatic generation of neural network chips from the nC specification language; and implementation of the high level synthesis part of the neural silicon compiler. The extension of the neural programming environment has been developed to adapt the nC language to hardware constraints, and to provide the environment with a simulation tool to test in advance the performance of the neural chips. Firstly, new hardware-specific requisites have been incorporated to nC. However, special attention has been taken to avoid transforming nC into a hardware-oriented language, since the system assumes minimum (or even no) knowledge of VLSI design from the application developer. Secondly, a simulator for neural network hardware has been developed, which assesses how well the generated circuit will perform the neural computation. Lastly, a hardware library of neural network models associated with a target VLSI architecture has been built. The development of the neural silicon compiler focuses on the high level synthesis part of the process. The goal of the silicon compiler is to take nC as the input language and automatically translate it into one or more identical integrated circuits, which are specified in VHDL (the IEEE standard hardware description language) at the register transfer level. The development of the high level synthesis comprises four major parts: firstly, compilation and software-like optimisations of nC; secondly, transformation of the compiled code into a graph-based internal representation, which has been designed to be the basis for the hardware synthesis; thirdly, further transformations and hardware-like optimisations on the internal representation; and finally, creation of the neural chip's data path and control unit that implement the behaviour specified in nC. Special attention has been devoted to the creation of optimised hardware structures for the ASNNCs employing both phases of neural computing on-chip: recall and learning. This is achieved through the data path and control synthesis algorithms, which adopt a heuristic approach that targets the generated hardware structure of the neural chip in a specific VLSI architecture, namely the Generic Neuron. The viability, concerning the effective use of silicon area versus speed, has been evaluated through the automatic generation of a VHDL description for the neural chip employing the Back Propagation neural network model. This description is compared with the one created manually by a hardware designer

    Low power digital signal processing

    Get PDF

    A Low-Power DSP Architecture for a Fully Implantable Cochlear Implant System-on-a-Chip.

    Full text link
    The National Science Foundation Wireless Integrated Microsystems (WIMS) Engineering Research Center at the University of Michigan developed Systems-on-a-Chip to achieve biomedical implant and environmental monitoring functionality in low-milliwatt power consumption and 1-2 cm3 volume. The focus of this work is implantable electronics for cochlear implants (CIs), surgically implanted devices that utilize existing nerve connections between the brain and inner-ear in cases where degradation of the sensory hair cells in the cochlea has occurred. In the absence of functioning hair cells, a CI processes sound information and stimulates the nderlying nerve cells with currents from implanted electrodes, enabling the patient to understand speech. As the brain of the WIMS CI, the WIMS microcontroller unit (MCU) delivers the communication, signal processing, and storage capabilities required to satisfy the aggressive goals set forth. The 16-bit MCU implements a custom instruction set architecture focusing on power-efficient execution by providing separate data and address register windows, multi-word arithmetic, eight addressing modes, and interrupt and subroutine support. Along with 32KB of on-chip SRAM, a low-power 512-byte scratchpad memory is utilized by the WIMS custom compiler to obtain an average of 18% energy savings across benchmarks. A synthesizable dynamic frequency scaling circuit allows the chip to select a precision on-chip LC or ring oscillator, and perform clock scaling to minimize power dissipation; it provides glitch-free, software-controlled frequency shifting in 100ns, and dissipates only 480μW. A highly flexible and expandable 16-channel Continuous Interleaved Sampling Digital Signal Processor (DSP) is included as an MCU peripheral component. Modes are included to process data, stimulate through electrodes, and allow experimental stimulation or processing. The entire WIMS MCU occupies 9.18mm2 and consumes only 1.79mW from 1.2V in DSP mode. This is the lowest reported consumption for a cochlear DSP. Design methodologies were analyzed and a new top-down design flow is presented that encourages hardware and software co-design as well as cross-domain verification early in the design process. An O(n) technique for energy-per-instruction estimations both pre- and post-silicon is presented that achieves less than 4% error across benchmarks. This dissertation advances low-power system design while providing an improvement in hearing recovery devices.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91488/1/emarsman_1.pd

    High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration

    Get PDF
    The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promising behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for synthesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations
    • …
    corecore