43 research outputs found

    Computational structures for application specific VLSI processors

    Get PDF

    Null convention logic circuits for asynchronous computer architecture

    Get PDF
    For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library

    Minimizing and exploiting leakage in VLSI

    Get PDF
    Power consumption of VLSI (Very Large Scale Integrated) circuits has been growing at an alarmingly rapid rate. This increase in power consumption, coupled with the increasing demand for portable/hand-held electronics, has made power consumption a dominant concern in the design of VLSI circuits today. Traditionally dynamic (switching) power has dominated the total power consumption of VLSI circuits. However, due to process scaling trends, leakage power has now become a major component of the total power consumption in VLSI circuits. This dissertation explores techniques to reduce leakage, as well as techniques to exploit leakage currents through the use of sub-threshold circuits. This dissertation consists of two studies. In the first study, techniques to reduce leakage are presented. These include a low leakage ASIC design methodology that uses high VT sleep transistors selectively, a methodology that combines input vector control and circuit modification, and a scheme to find the optimum reverse body bias voltage to minimize leakage. As the minimum feature size of VLSI fabrication processes continues to shrink with each successive process generation (along with the value of supply voltage and therefore the threshold voltage of the devices), leakage currents increase exponentially. Leakage currents are hence seen as a necessary evil in traditional VLSI design methodologies. We present an approach to turn this problem into an opportunity. In the second study in this dissertation, we attempt to exploit leakage currents to perform computation. We use sub-threshold digital circuits and come up with ways to get around some of the pitfalls associated with sub-threshold circuit design. These include a technique that uses body biasing adaptively to compensate for Process, Voltage and Temperature (PVT) variations, a design approach that uses asynchronous micro-pipelined Network of Programmable Logic Arrays (NPLAs) to help improve the throughput of sub-threshold designs, and a method to find the optimum supply voltage that minimizes energy consumption in a circuit

    Energy Efficient Design for Deep Sub-micron CMOS VLSIs

    Get PDF
    Over the past decade, low power, energy efficient VLSI design has been the focal point of active research and development. The rapid technology scaling, the growing integration capacity, and the mounting active and leakage power dissipation are contributing to the growing complexity of modern VLSI design. Careful power planning on all design levels is required. This dissertation tackles the low-power, low-energy challenges in deep sub-micron technologies on the architecture and circuit levels. Voltage scaling is one of the most efficient ways for reducing power and energy. For ultra-low voltage operation, a new circuit technique which allows bulk CMOS circuits to work in the sub-0. 5V supply territory is presented. The threshold voltage of the slow PMOS transistor is controlled dynamically to get a lower threshold voltage during the active mode. Due to the reduced threshold voltage, switching speed becomes faster while active leakage current is increased. A technique to dynamically manage active leakage current is presented. Energy reduction resulting from using the proposed structure is demonstrated through simulations of different circuits with different levels of complexity. As technology scales, the mounting leakage current and degraded noise immunity impact performance especially that of high performance dynamic circuits. Dual threshold technology shows a good potential for leakage reduction while meeting performance goals. A model for optimally selecting threshold voltages and transistor sizes in wide fan-in dynamic circuits is presented. On the circuit level, a novel circuit level technique which handles the trade-off between noise immunity and energy dissipation for wide fan-in dynamic circuits is presented. Energy efficiency of the proposed wide fan-in dynamic circuit is further enhanced through efficient low voltage operation. Another direct consequence of technology scaling is the growing impact of interconnect parasitics and process variations on performance. Traditionally, worst case process, parasitics, and environmental conditions are considered. Designing for worst case guarantees a fail-safe operation but requires a large delay and voltage margins. This large margin can be recovered if the design can adapt to the actual silicon conditions. Dynamic voltage scaling is considered a key enabler in reducing such margin. An on-chip process identifier to recover the margin required due to process variations is described. The proposed architecture adjusts supply voltage using a hybrid between the one-time voltage setting and the continuous monitoring modes of operation. The interconnect impact on delay is minimized through a novel adaptive voltage scaling architecture. The proposed system recovers the large delay and voltage margins required by conventional systems by closely tracking the actual critical path at anytime. By tracking the actual critical path, the proposed system is robust and more energy efficient compared to both the conventional open-loop and closed-loop systems

    A compact high-energy particle detector for low-cost deep space missions

    Get PDF
    Over the last few decades particle physics has led to many new discoveries, laying the foundation for modern science. However, there are still many unanswered questions which the next generation of particle detectors could address, potentially expanding our knowledge and understanding of the Universe. Owing to recent technological advancements, electronic sensors are now able to acquire measurements previously unobtainable, creating opportunities for new deep-space high-energy particle missions. Consequently, a new compact instrument was developed capable of detecting gamma rays, neutrons and charged particles. This instrument combines the latest in FPGA System-on-Chip technology as the central processor and a 3x3 array of silicon photomultipliers coupled with an organic plastic scintillator as the detector. Using modern digital pulse shape discrimination and signal processing techniques, the scintillator and photomultiplier combination has been shown to accurately discriminate between the di_erent particle types and provide information such as total energy and incident direction. The instrument demonstrated the ability to capture 30,000 particle events per second across 9 channels - around 15 times that of the U.S. based CLAS detector. Furthermore, the input signals are simultaneously sampled at a maximum rate of 5 GSPS across all channels with 14-bit resolution. Future developments will include FPGA-implemented digital signal processing as well as hardware design for small satellite based deep-space missions that can overcome radiation vulnerability

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    Hardware Considerations for Signal Processing Systems: A Step Toward the Unconventional.

    Full text link
    As we progress into the future, signal processing algorithms are becoming more computationally intensive and power hungry while the desire for mobile products and low power devices is also increasing. An integrated ASIC solution is one of the primary ways chip developers can improve performance and add functionality while keeping the power budget low. This work discusses ASIC hardware for both conventional and unconventional signal processing systems, and how integration, error resilience, emerging devices, and new algorithms can be leveraged by signal processing systems to further improve performance and enable new applications. Specifically this work presents three case studies: 1) a conventional and highly parallel mix signal cross-correlator ASIC for a weather satellite performing real-time synthetic aperture imaging, 2) an unconventional native stochastic computing architecture enabled by memristors, and 3) two unconventional sparse neural network ASICs for feature extraction and object classification. As improvements from technology scaling alone slow down, and the demand for energy efficient mobile electronics increases, such optimization techniques at the device, circuit, and system level will become more critical to advance signal processing capabilities in the future.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116685/1/knagphil_1.pd

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    LOW PHASE NOISE CMOS PLL FREQUENCY SYNTHESIZER DESIGN AND ANALYSIS

    Get PDF
    The phase-locked loop (PLL) frequency synthesizer is a critical device of wireless transceivers. It works as a local oscillator (LO) for frequency translation and channel selection in the transceivers but suffers phase noise including reference spurs. In this dissertation for lowing phase noise and power consumption, efforts are placed on the new design of PLL components: VCOs, charge pumps and sigma delta modulators. Based on the analysis of the VCO phase noise generation mechanism and improving on the literature results, a design-oriented phase noise model for a complementary cross-coupled LC VCO is provided. The model reveals the relationship between the phase noise performance and circuit design parameters. Using this phase noise model, an optimized 2GHz low phase noise CMOS LC VCO is designed, simulated and fabricated. The theoretical analysis results are confirmed by the simulation and experimental results. With this VCO phase noise model, we also design a low phase noise, low gain wideband VCO with the typical VCO gain around 100MHz/V. Improving upon literature results, a complete quantitative analysis of reference spur is given in this dissertation. This leads to a design of a charge pump by using a negative feedback circuit and replica bias to reduce the current mismatch which causes the reference spur. In addition, low-impedance charge/discharge paths are provided to overcome the charge pump current glitches which also cause PLL spurs. With a large bit-width high order sigma delta modulator, the fractional-N PLL has fine frequency resolution and fast locking time. Based on an analysis of sigma delta modulator models introduced in this dissertation, a 3rd-order MASH 1-1-1 digital sigma delta modulator is designed. Pipelining techniques and true single phase clock (TSPC) techniques are used for saving power and area. Included is the design of a fully integrated 2.4GHz §¢ fractional-N CMOS PLL frequency synthesizer. It takes advantage of a sigma delta modulator to get a very fine frequency resolution and a relatively large loop bandwidth. This frequency synthesizer is a 4th-order charge pump PLL with 26MHz reference frequency. The loop bandwidth is about 150KHz, while the whole PLL phase noise is about -120dBc/Hz at 1MHz frequency offset
    corecore