6 research outputs found

    A 100-MIPS GaAs asynchronous microprocessor

    Get PDF
    The authors describe how they ported an asynchronous microprocessor previously implemented in CMOS to gallium arsenide, using a technology-independent asynchronous design technique. They introduce new circuits including a sense-amplifier, a completion detection circuit, and a general circuit structure for operators specified by production rules. The authors used and tested these circuits in a variety of designs

    Design and implementation of Asynchronous SRAM

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Covering conditions and algorithms for the synthesis of speed-independent circuits

    Get PDF
    Journal ArticleAbstract-This paper presents theory and algorithms for the synthesis of standard C-implementations of speed-independent circuits. These implementations are block-level circuits which may consist of atomic gates to perform complex functions in order to ensure hazard freedom. First, we present Boolean covering conditions that guarantee that the standard C-implementations operate correctly. Then, we present two algorithms that produce optimal solutions to the covering problem. The first algorithm is always applicable, but does not complete on large circuits. The second algorithm, motivated by our observation that our covering problem can often be solved with a single cube, finds the optimal single-cube solution when such a solution exists. When applicable, the second algorithm is dramatically more efficient than the first, more general algorithm. We present results for benchmark specifications which indicate that our single-cube algorithm is applicable on most benchmark circuits and reduces run times by over an order of magnitude. The block-level circuits generated by our algorithms are a good starting point for tools that perform technology mapping to obtain gate-level speed independent circuits

    Asynchronous memory design.

    Get PDF
    by Vincent Wing-Yun Sit.Thesis submitted in: June 1997.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 1-4 (3rd gp.)).Abstract also in Chinese.TABLE OF CONTENTSLIST OF FIGURESLIST OF TABLESACKNOWLEDGEMENTSABSTRACTChapter 1. --- INTRODUCTION --- p.1Chapter 1.1 --- ASYNCHRONOUS DESIGN --- p.2Chapter 1.1.1 --- POTENTIAL ADVANTAGES --- p.2Chapter 1.1.2 --- DESIGN METHODOLOGIES --- p.2Chapter 1.1.3 --- SYSTEM CHARACTERISTICS --- p.3Chapter 1.2 --- ASYNCHRONOUS MEMORY --- p.5Chapter 1.2.1 --- MOTIVATION --- p.5Chapter 1.2.2 --- DEFINITION --- p.9Chapter 1.3 --- PROPOSED MEMORY DESIGN --- p.10Chapter 1.3.1 --- CONTROL INTERFACE --- p.10Chapter 1.3.2 --- OVERVIEW --- p.11Chapter 1.3.3 --- HANDSHAKE CONTROL PROTOCOL --- p.13Chapter 2. --- THEORY --- p.16Chapter 2.1 --- VARIABLE BIT LINE LOAD --- p.17Chapter 2.1.1 --- DEFINITION --- p.17Chapter 2.1.2 --- ADVANTAGE --- p.17Chapter 2.2 --- CURRENT SENSING COMPLETION DETECTION --- p.18Chapter 2.2.1 --- BLOCK DIAGRAM --- p.19Chapter 2.2.2 --- GENERAL LSD CURRENT SENSOR --- p.21Chapter 2.2.3 --- CMOS LSD CURRENT SENSOR --- p.23Chapter 2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.28Chapter 2.3.1 --- DATA READING IN MEMORY CIRCUIT --- p.29Chapter 2.3.2 --- BLOCK DIAGRAM --- p.30Chapter 2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.32Chapter 2.4.1 --- ADVANTAGE --- p.32Chapter 2.4.2 --- BLOCK DIAGRAM --- p.33Chapter 3. --- IMPLEMENTATION --- p.35Chapter 3.1 --- 1M-BIT SRAM FRAMEWORK --- p.36Chapter 3.1.1 --- INTRODUCTION --- p.36Chapter 3.1.2 --- FRAMEWORK --- p.36Chapter 3.2 --- CONTROL CIRCUIT --- p.40Chapter 3.2.1 --- CONTROL SIGNALS --- p.40Chapter 3.2.1.1 --- EXTERNAL CONTROL SIGNALS --- p.40Chapter 3.2.1.2 --- INTERNAL CONTROL SIGNALS --- p.41Chapter 3.2.2 --- READ / WRITE STATE TRANSITION GRAPHS --- p.42Chapter 3.2.3 --- IMPLEMENTATION --- p.43Chapter 3.3 --- BIT LINE SEGMENTATION --- p.45Chapter 3.3.1 --- FOUR REGIONS SEGMENTATION --- p.46Chapter 3.3.2 --- OPERATION --- p.50Chapter 3.3.3 --- MEMORY CELL --- p.51Chapter 3.4 --- CURRENT SENSING COMPLETION DETECTION --- p.52Chapter 3.4.1 --- ONE BIT DATA BUS --- p.53Chapter 3.4.2 --- EIGHT BITS DATA BUS --- p.55Chapter 3.5 --- VOLTAGE SENSING COMPLETION DETECTION --- p.57Chapter 3.5.1 --- ONE BIT DATA BUS --- p.57Chapter 3.5.2 --- EIGHT BITS DATA BUS --- p.59Chapter 3.6 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.60Chapter 4. --- SIMULATION --- p.63Chapter 4.1 --- SIMULATION ENVIRONMENT --- p.64Chapter 4.1.1 --- SIMULATION PARAMETERS --- p.64Chapter 4.1.2 --- MEMORY TIMING SPECIFICATIONS --- p.64Chapter 4.1.3 --- BIT LINE LOAD DETERMINATION --- p.67Chapter 4.2 --- BENCHMARK SIMULATION --- p.69Chapter 4.2.1 --- CIRCUIT SCHEMATIC --- p.69Chapter 4.2.2 --- RESULTS --- p.71Chapter 4.3 --- CURRENT SENSING COMPLETION DETECTION --- p.73Chapter 4.3.1 --- CIRCUIT SCHEMATIC --- p.73Chapter 4.3.2 --- SENSE AMPLIFIER CURRENT CHARACTERISTICS --- p.75Chapter 4.3.3 --- RESULTS --- p.76Chapter 4.3.4 --- OBSERVATIONS --- p.80Chapter 4.4 --- VOLTAGE SENSING COMPLETION DETECTION --- p.82Chapter 4.4.1 --- CIRCUIT SCHEMATIC --- p.82Chapter 4.4.2 --- RESULTS --- p.83Chapter 4.5 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.89Chapter 4.5.1 --- CIRCUIT SCHEMATIC --- p.89Chapter 4.5.2 --- RESULTS --- p.90Chapter 5. --- TESTING --- p.97Chapter 5.1 --- TEST CHIP DESIGN --- p.98Chapter 5.1.1 --- BLOCK DIAGRAM --- p.98Chapter 5.1.2 --- SCHEMATIC --- p.100Chapter 5.1.3 --- LAYOUT --- p.102Chapter 5.2 --- HSPICE POST-LAYOUT SIMULATION RESULTS --- p.104Chapter 5.2.1 --- GRAPHICAL RESULTS --- p.105Chapter 5.2.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.108Chapter 5.2.3 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.114Chapter 5.3 --- MEASUREMENTS --- p.117Chapter 5.3.1 --- LOGIC RESULTS --- p.118Chapter 5.3.1.1 --- METHOD --- p.118Chapter 5.3.1.2 --- RESULTS --- p.118Chapter 5.3.2 --- TIMING RESULTS --- p.119Chapter 5.3.2.1 --- METHOD --- p.119Chapter 5.3.2.2 --- GRAPHICAL RESULTS --- p.121Chapter 5.3.2.3 --- VOLTAGE SENSING COMPLETION DETECTION --- p.123Chapter 5.3.2.4 --- MULTIPLE DELAYS COMPLETION GENERATION --- p.125Chapter 6. --- DISCUSSION --- p.127Chapter 6.1 --- CURRENT SENSING COMPLETION DETECTION --- p.128Chapter 6.1.1 --- COMMENTS AND CONCLUSION --- p.128Chapter 6.1.2 --- SUGGESTION --- p.128Chapter 6.2 --- VOLTAGE SENSING COMPLETION DETECTION --- p.129Chapter 6.2.1 --- RESULTS COMPARISON --- p.129Chapter 6.2.1.1 --- GENERAL --- p.129Chapter 6.2.1.2 --- BIT LINE LOAD --- p.132Chapter 6.2.1.3 --- BIT LINE SEGMENTATION --- p.133Chapter 6.2.2 --- RESOURCE CONSUMPTION --- p.133Chapter 6.2.2.1 --- AREA --- p.133Chapter 6.2.2.2 --- POWER --- p.134Chapter 6.2.3 --- COMMENTS AND CONCLUSION --- p.134Chapter 6.3 --- MULTIPLE DELAY COMPLETION GENERATION --- p.135Chapter 6.3.1 --- RESULTS COMPARISON --- p.135Chapter 6.3.1.1 --- GENERAL --- p.135Chapter 6.3.1.2 --- BIT LINE LOAD --- p.136Chapter 6.3.1.3 --- BIT LINE SEGMENTATION --- p.137Chapter 6.3.2 --- RESOURCE CONSUMPTION --- p.138Chapter 6.3.2.1 --- AREA --- p.138Chapter 6.3.2.2 --- POWER --- p.138Chapter 6.3.3 --- COMMENTS AND CONCLUSION --- p.138Chapter 6.4 --- GENERAL COMMENTS --- p.139Chapter 6.4.1 --- COMPARISON OF THE THREE TECHNIQUES --- p.139Chapter 6.4.2 --- BIT LINE SEGMENTATION --- p.141Chapter 6.5 --- APPLICATION --- p.142Chapter 6.6 --- FURTHER DEVELOPMENTS --- p.144Chapter 6.6.1 --- INTERACE WITH TWO-PHASE HCP --- p.144Chapter 6.6.2 --- DATA BUS EXPANSION --- p.146Chapter 6.6.3 --- SPEED OPTIMIZATION --- p.147Chapter 6.6.4 --- MODIFIED WRITE COMPLETION METHOD --- p.150Chapter 7. --- CONCLUSION --- p.152Chapter 7.1 --- PROBLEM DEFINITION --- p.152Chapter 7.2 --- IMPLEMENTATION --- p.152Chapter 7.3 --- EVALUATION --- p.153Chapter 7.4 --- COMMENTS AND SUGGESTIONS --- p.155Chapter 8. --- REFERENCES --- p.R-lChapter 9. --- APPENDIX --- p.A-lChapter 9.1 --- HSPICE SIMULATION PARAMETERS --- p.A-lChapter 9.1.1 --- TYPICAL SIMULATION CONDITION --- p.A-lChapter 9.1.2 --- FAST SIMULATION CONDITION --- p.A-3Chapter 9.1.3 --- SLOW SIMULATION CONDITION --- p.A-4Chapter 9.2 --- SRAM CELL LAYOUT AND NETLIST --- p.A-5Chapter 9.3 --- TEST CHIP SPECIFICATIONS --- p.A-8Chapter 9.3.1 --- GENERAL SPECIFICATIONS --- p.A-8Chapter 9.3.2 --- PIN ASSIGNMENT --- p.A-9Chapter 9.3.3 --- TIMING DIAGRAMS AND SPECIFICATIONS --- p.A-10Chapter 9.3.4 --- SCHEMATICS AND LAYOUTS --- p.A-11Chapter 9.3.4.1 --- STANDARD MEMORY COMPONENTS --- p.A-12Chapter 9.3.4.2 --- DVSCD AND MDCG COMPONENTS --- p.A-20Chapter 9.3.5 --- MICROPHOTOGRAPH --- p.A-2

    An asynchronous forth microprocessor.

    Get PDF
    Ping-Ki Tsang.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 87-95).Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation and Aims --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Overview of the Thesis --- p.4Chapter 2 --- Asynchronous Logic g --- p.6Chapter 2.1 --- Motivation --- p.6Chapter 2.2 --- Timing Models --- p.9Chapter 2.2.1 --- Fundamental-Mode Model --- p.9Chapter 2.2.2 --- Delay-Insensitive Model --- p.10Chapter 2.2.3 --- QDI and Speed-Independent Models --- p.11Chapter 2.3 --- Asynchronous Signalling Protocols --- p.12Chapter 2.3.1 --- 2-phase Handshaking Protocol --- p.12Chapter 2.3.2 --- 4-phase Handshaking Protocol --- p.13Chapter 2.4 --- Data Representations --- p.14Chapter 2.4.1 --- Dual Rail Coded Data --- p.15Chapter 2.4.2 --- Bundled Data --- p.15Chapter 2.5 --- Previous Asynchronous Processors --- p.16Chapter 2.6 --- Summary --- p.20Chapter 3 --- The MSL16 Architecture --- p.21Chapter 3.1 --- RISC Machines --- p.21Chapter 3.2 --- Stack Machines --- p.23Chapter 3.3 --- Forth and its Applications --- p.24Chapter 3.4 --- MSL16 --- p.26Chapter 3.4.1 --- Architecture --- p.28Chapter 3.4.2 --- Instruction Set --- p.30Chapter 3.4.3 --- The Datapath --- p.32Chapter 3.4.4 --- Interrupts and Exceptions --- p.33Chapter 3.4.5 --- Implementing Forth primitives --- p.34Chapter 3.4.6 --- Code Density Estimation --- p.34Chapter 3.5 --- Summary --- p.35Chapter 4 --- Design Methodology --- p.37Chapter 4.1 --- Basic Notation --- p.38Chapter 4.2 --- Specification of MSL16A --- p.39Chapter 4.3 --- Decomposition into Concurrent Processes --- p.41Chapter 4.4 --- Separation of Control and Datapath --- p.45Chapter 4.5 --- Handshaking Expansion --- p.45Chapter 4.5.1 --- 4-Phase Handshaking Protocol --- p.46Chapter 4.6 --- Production-rule Expansion --- p.47Chapter 4.7 --- Summary --- p.48Chapter 5 --- Implementation --- p.49Chapter 5.1 --- C-element --- p.49Chapter 5.2 --- Mutual Exclusion Elements --- p.51Chapter 5.3 --- Caltech Asynchronous Synthesis Tools --- p.53Chapter 5.4 --- Stack Design --- p.54Chapter 5.4.1 --- Eager Stack Control --- p.55Chapter 5.4.2 --- Lazy Stack Control --- p.56Chapter 5.4.3 --- Eager/Lazy Stack Datapath --- p.53Chapter 5.4.4 --- Pointer Stack Control --- p.61Chapter 5.4.5 --- Pointer Stack Datapath --- p.62Chapter 5.5 --- ALU Design --- p.62Chapter 5.5.1 --- The Addition Operation --- p.63Chapter 5.5.2 --- Zero-Checker --- p.64Chapter 5.6 --- Memory Interface and Tri-state Buffers --- p.64Chapter 5.7 --- MSL16A --- p.65Chapter 5.8 --- Summary --- p.66Chapter 6 --- Results --- p.67Chapter 6.1 --- FPGA based implementation of MSL16 --- p.67Chapter 6.2 --- MSL16A --- p.69Chapter 6.2.1 --- A Comparison of 3 Stack Designs --- p.69Chapter 6.2.2 --- Evaluation of the ALU --- p.73Chapter 6.2.3 --- Evaluation of MSL16A --- p.74Chapter 6.3 --- Summary --- p.81Chapter 7 --- Conclusions --- p.83Chapter 7.1 --- Future Work --- p.85Bibliography --- p.87Publications --- p.9

    Null convention logic circuits for asynchronous computer architecture

    Get PDF
    For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library
    corecore