56 research outputs found

    Reconfigurable Shift Switching Parallel Comparators

    Get PDF
    We present novel asynchronous VLSI comparator schemes which are based on recently proposed reconfigurable shift switch logic and the traditional (precharged) CMOS domino logic. The schemes always produce a semaphore as a by-product of the process to indicate the end of domino process, which requires no additional delay and a minimal number of additional devices. For a large percentage of inputs the computations are much faster than traditional synchronous comparators due to the full utilization of the inherent speed of the circuits. Also the schemes are simple, area compact and stable

    Design and Analysis of a Novel Low Complexity and Low Power Ping Lock Arbiter by using EGDI based CMOS Technique

    Get PDF
    Network-on-chip (NoC) provides solution to overcome the complications of the on-chip interconnect architecture in multi-core systems. It mainly consists of router, links and network interface. An essential component of on-chip router is an arbiter that significantly impacts the performance of the router. The arbiter should provide fast and fair arbitration when it is placed in Critical Path Delay (CPD) systems. The main aim of this research work is to design a novel arbiter for an effective network scheduler in complex real time applications. At the same time resource allocation and power consumption should be very low. Previously, a novel gate level Ping Lock Arbiter (PLA) is designed to overcome the limited fair arbitration in Improved Ping Pong Arbiter (IPPA) with less delay. But the chip size and power consumption are very high. To overcome this problem, an Effective Gate Diffusion Input (EGDI) logic based CMOS scheme is used to design a novel Compact Ping Lock Arbiter (CPLA).  The proposed CPLA is compared with the existing PLA based on static CMOS scheme. The comparison between the conventional and proposed arbiter is carried out to analyze the area, delay and power by using Tanner Tool 14.1 with 250nm and 45nm. The results show that the proposed NPLA achieves low power and consumes less than the existing ping lock arbiter

    Null convention logic circuits for asynchronous computer architecture

    Get PDF
    For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library

    A Constant Delay Logic Style - An Alternative Way of Logic Design

    Get PDF
    High performance, energy efficient logic style has always been a popular research topic in the field of very large scale integrated (VLSI) circuits because of the continuous demands of ever increasing circuit operating frequency. The invention of the dynamic logic in the 80s is one of the answers to this request as it allows designers to implement high performance circuit block, i.e., arithmetic logic unit (ALU), at an operating frequency that traditional static and pass transistor CMOS logic styles are difficult to achieve. However, the performance enhancement comes with several costs, including reduced noise margin,charge-sharing noise, and higher power dissipation due to higher data activity. Furthermore, dynamic logic has gradually lost its performance advantage over static logic due to the increased self-loading ratio in deep-submicron technology (65nm and below) because of the additional NMOS CLK footer transistor. Because of dynamic logic's limitations and diminished speed reward, a slowly rising need has emerged in the past decade to explore new logic style that goes beyond dynamic logic. In this thesis a constant delay (CD) logic style is proposed. The constant delay characteristic of this logic style regardless of the logic expression makes it suitable in implementing complicated logic expression such as addition. Moreover, CD logic exhibits a unique characteristic where the output is pre-evaluated before the inputs from the preceding stage is ready. This feature enables performance advantage over static and dynamic logic styles in a single cycle, multi-stage circuit block. Several design considerations including appropriate timing window width adjustment to reduce power consumption and maintain sufficient noise margin to ensure robust operations are discussed and analyzed. Using 65nm general purpose CMOS technology, the proposed logic demonstrates an average speed up of 94% and 56% over static and dynamic logic respectively in five different logic expressions. Post layout simulation results of 8-bit ripple carry adders conclude that CD-based design is 39% and 23% faster than the static and dynamic-based adders respectively. For ultra-high speed applications, CD-based design exhibits improved energy, power-delay product, and energy-delay product efficiency compared to static and dynamic counterparts

    Design, analysis and optimization of a dynamically reconfi gurable regenerative comparator for ultra-low power 6-bit TC-ADCs in 90nm CMOS technology

    Get PDF
    In this work the threshold conïŹgurable regenerative comparator on which TC-ADCs are based is optimized to further reduce the power consumption for use in battery-less biomedical sensor applications.\nMoreover, the effect of device mismatches on the offset, gain and linearity errors of the ADC is analyzed by means of Monte Carlo simulations.\nThis optimized comparator reduces the power consumption from 13uW to 3uW, while maintaining the same full scale rang

    The 1991 3rd NASA Symposium on VLSI Design

    Get PDF
    Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    NASA SERC 1990 Symposium on VLSI Design

    Get PDF
    This document contains papers presented at the first annual NASA Symposium on VLSI Design. NASA's involvement in this event demonstrates a need for research and development in high performance computing. High performance computing addresses problems faced by the scientific and industrial communities. High performance computing is needed in: (1) real-time manipulation of large data sets; (2) advanced systems control of spacecraft; (3) digital data transmission, error correction, and image compression; and (4) expert system control of spacecraft. Clearly, a valuable technology in meeting these needs is Very Large Scale Integration (VLSI). This conference addresses the following issues in VLSI design: (1) system architectures; (2) electronics; (3) algorithms; and (4) CAD tools

    Timing model derivation : static analysis of hardware description languages

    Get PDF
    Safety-critical hard real-time systems are subject to strict timing constraints. In order to derive guarantees on the timing behavior, the worst-case execution time (WCET) of each task comprising the system has to be known. The aiT tool has been developed for computing safe upper bounds on the WCET of a task. Its computation is mainly based on abstract interpretation of timing models of the processor and its periphery. These models are currently hand-crafted by human experts, which is a time-consuming and error-prone process. Modern processors are automatically synthesized from formal hardware specifications. Besides the processor’s functional behavior, also timing aspects are included in these descriptions. A methodology to derive sound timing models using hardware specifications is described within this thesis. To ease the process of timing model derivation, the methodology is embedded into a sound framework. A key part of this framework are static analyses on hardware specifications. This thesis presents an analysis framework that is build on the theory of abstract interpretation allowing use of classical program analyses on hardware description languages. Its suitability to automate parts of the derivation methodology is shown by different analyses. Practical experiments demonstrate the applicability of the approach to derive timing models. Also the soundness of the analyses and the analyses’ results is proved.Sicherheitskritische Echtzeitsysteme unterliegen strikten Zeitanforderungen. Um ihr Zeitverhalten zu garantieren mĂŒssen die AusfĂŒhrungszeiten der einzelnen Programme, die das System bilden, bekannt sein. Um sichere obere Schranken fĂŒr die AusfĂŒhrungszeit von Programmen zu berechnen wurde aiT entwickelt. Die Berechnung basiert auf abstrakter Interpretation von Zeitmodellen des Prozessors und seiner Peripherie. Diese Modelle werden hĂ€ndisch in einem zeitaufwendigen und fehleranfĂ€lligen Prozess von Experten entwickelt. Moderne Prozessoren werden automatisch aus formalen Spezifikationen erzeugt. Neben dem funktionalen Verhalten beschreiben diese auch das Zeitverhalten des Prozessors. In dieser Arbeit wird eine Methodik zur sicheren Ableitung von Zeitmodellen aus der Hardwarespezifikation beschrieben. Um den Ableitungsprozess zu vereinfachen ist diese Methodik in eine automatisierte Umgebung eingebettet. Ein Hauptbestandteil dieses Systems sind statische Analysen auf Hardwarebeschreibungen. Diese Arbeit stellt eine Analyse-Umgebung vor, die auf der Theorie der abstrakten Interpretation aufbaut und den Einsatz von klassischen Programmanalysen auf Hardwarebeschreibungssprachen erlaubt. Die Eignung des Systems, Teile der Ableitungsmethodik zu automatisieren, wird anhand einiger Analysen gezeigt. Experimentelle Ergebnisse zeigen die Anwendbarkeit der Methodik zur Ableitung von Zeitmodellen. Die Korrektheit der Analysen und der Analyse-Ergebnisse wird ebenfalls bewiesen

    Subject index volumes 1–92

    Get PDF
    • 

    corecore