9 research outputs found

    An asynchronous low-power 80C51 microcontroller

    Get PDF

    High-level asynchronous system design using the ACK framework

    Get PDF
    Journal ArticleDesigning asynchronous circuits is becoming easier as a number of design styles are making the transition from research projects to real, usable tools. However, designing asynchronous "systems" is still a difficult problem. We define asynchronous systems to be medium to large digital systems whose descriptions include both datapath and control, that may involve non-trivial interface requirements, and whose control is too large to be synthesized in one large controller. ACK is a framework for designing high performance asynchronous systems of this type. In ACK we advocate an approach that begins with procedural level descriptions of control and datapath and results in a hybrid system that mixes a variety of hardware implementation styles including burst-mode AFSMs, macromodule circuits, and programmable control. We present our views on what makes asynchronous high level system design different from lower level circuit design, motivate our ACK approach, and demonstrate using an example system design

    Test and Testability of Asynchronous Circuits

    Full text link
    The ever-increasing transistor shrinkage and higher clock frequencies are causing serious clock distribution, power management, and reliability issues. Asynchronous design is predicted to have a significant role in tackling these challenges because of its distributed control mechanism and on-demand, rather than continuous, switching activity. Null Convention Logic (NCL) is a robust and low-power asynchronous paradigm that introduces new challenges to test and testability algorithms because 1) the lack of deterministic timing in NCL complicates the management of test timing, 2) all NCL gates are state-holding and even simple combinational circuits show sequential behaviour, and 3) stuck-at faults on gate internal feedback (GIF) of NCL gates do not always cause an incorrect output and therefore are undetectable by automatic test pattern generation (ATPG) algorithms. Existing test methods for NCL use clocked hardware to control the timing of test. Such test hardware could introduce metastability issues into otherwise highly robust NCL devices. Also, existing test techniques for NCL handle the high-statefulness of NCL circuits by excessive incorporation of test hardware which imposes additional area, propagation delay and power consumption. This work, first, proposes a clockless self-timed ATPG that detects all faults on the gate inputs and a share of the GIF faults with no added design for test (DFT). Then, the efficacy of quiescent current (IDDQ) test for detecting GIF faults undetectable by a DFT-less ATPG is investigated. Finally, asynchronous test hardware, including test points, a scan cell, and an interleaved scan architecture, is proposed for NCL-based circuits. To the extent of our knowledge, this is the first work that develops clockless, self-timed test techniques for NCL while minimising the need for DFT, and also the first work conducted on IDDQ test of NCL. The proposed methods are applied to multiple NCL circuits with up to 2,633 NCL gates (10,000 CMOS Boolean gates), in 180 and 45 nm technologies and show average fault coverage of 88.98% for ATPG alone, 98.52% including IDDQ test, and 99.28% when incorporating test hardware. Given that this fault coverage includes detection of GIF faults, our work has 13% higher fault coverage than previous work. Also, because our proposed clockless test hardware eliminates the need for double-latching, it reduces the average area and delay overhead of previous studies by 32% and 50%, respectively

    Design of asynchronous microprocessor for power proportionality

    Get PDF
    PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s law, while the costs involved in their design keep growing, also at an exponential rate. The reason is the ever increasing complexity of processors, which modern EDA tools struggle to keep up with. This makes further scaling for performance subject to a high risk in the reliability of the system. To keep this risk low, yet improve the performance, CPU designers try to optimise various parts of the processor. Instruction Set Architecture (ISA) is a significant part of the whole processor design flow, whose optimal design for a particular combination of available hardware resources and software requirements is crucial for building processors with high performance and efficient energy utilisation. This is a challenging task involving a lot of heuristics and high-level design decisions. Another issue impacting CPU reliability is continuous scaling for power consumption. For the last decades CPU designers have been mainly focused on improving performance, but “keeping energy and power consumption in mind”. The consequence of this was a development of energy-efficient systems, where energy was considered as a resource whose consumption should be optimised. As CMOS technology was progressing, with feature size decreasing and power delivered to circuit components becoming less stable, the energy resource turned from an optimisation criterion into a constraint, sometimes a critical one. At this point power proportionality becomes one of the most important aspects in system design. Developing methods and techniques which will address the problem of designing a power-proportional microprocessor, capable to adapt to varying operating conditions (such as low or even unstable voltage levels) and application requirements in the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed by proposing a new design flow for the development of an ISA for microprocessors, which can be altered to suit a particular hardware platform or a specific operating mode. This flow uses an expressive and powerful formalism for the specification of processor instruction sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures large sets of behavioural scenarios for a microarchitectural level in a computationally efficient form amenable to formal transformations for synthesis, verification and automated derivation of asynchronous hardware for the CPU microcontrol. The feasibility of the methodology, novel design flow and a number of optimisation techniques was proven in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The chip showed the ability to work in a wide range of operating voltage and environmental conditions. Depending on application requirements and power budget our ASIC supports several operating modes: one optimised for energy consumption and the other one for performance. This was achieved by extending a traditional datapath structure with an auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations resulted in a reconfigurable and adaptable implementation, which was proven by measurements, analysis and evaluation of the chip.EPSR

    Design and performance optimization of asynchronous networks-on-chip

    Get PDF
    As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems. In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style. The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions. First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains. Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow. Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel. In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement

    A High-Throughput, Low-Power Asynchronous Mesh-of-Trees Interconnection Network for the Explicit Multi-Threading (XMT) Parallel Architecture

    Get PDF
    This thesis presents an asynchronous (clockless) Mesh-of-Trees network that consumes less power and area than the synchronous Mesh-of-Trees network, while maintaining high throughput and low latency. Two new asynchronous designs are proposed for the fundamental pipelined components of the network (routing and arbitration), which are optimized for power, area, latency and throughput. Mixed-timing interfaces are added to create a mixed-timing network which provides communication between synchronous and asynchronous domains. Two issues top the agenda of CPU design in the emerging many-core era: programmers' productivity and power consumption. Through its reliance on the richest available theory of parallel algorithms, the eXplicit Multi-Threading (XMT) parallel architecture addresses programmers' productivity. The motivation for this work is to provide an effective interconnection network for the XMT architecture in terms of both performance and power consumption. Performance of the XMT processor with the mixed-timing network is measured for several applications

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Asynchronous circuit design - A tutorial

    Get PDF

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
    corecore