2,357 research outputs found

    DPA on quasi delay insensitive asynchronous circuits: formalization and improvement

    Full text link
    The purpose of this paper is to formally specify a flow devoted to the design of Differential Power Analysis (DPA) resistant QDI asynchronous circuits. The paper first proposes a formal modeling of the electrical signature of QDI asynchronous circuits. The DPA is then applied to the formal model in order to identify the source of leakage of this type of circuits. Finally, a complete design flow is specified to minimize the information leakage. The relevancy and efficiency of the approach is demonstrated using the design of an AES crypto-processor.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Asynchronous techniques for system-on-chip design

    Get PDF
    SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed

    Indicating Asynchronous Array Multipliers

    Full text link
    Multiplication is an important arithmetic operation that is frequently encountered in microprocessing and digital signal processing applications, and multiplication is physically realized using a multiplier. This paper discusses the physical implementation of many indicating asynchronous array multipliers, which are inherently elastic and modular and are robust to timing, process and parametric variations. We consider the physical realization of many indicating asynchronous array multipliers using a 32/28nm CMOS technology. The weak-indication array multipliers comprise strong-indication or weak-indication full adders, and strong-indication 2-input AND functions to realize the partial products. The multipliers were synthesized in a semi-custom ASIC design style using standard library cells including a custom-designed 2-input C-element. 4x4 and 8x8 multiplication operations were considered for the physical implementations. The 4-phase return-to-zero (RTZ) and the 4-phase return-to-one (RTO) handshake protocols were utilized for data communication, and the delay-insensitive dual-rail code was used for data encoding. Among several weak-indication array multipliers, a weak-indication array multiplier utilizing a biased weak-indication full adder and the strong-indication 2-input AND function is found to have reduced cycle time and power-cycle time product with respect to RTZ and RTO handshaking for 4x4 and 8x8 multiplications. Further, the 4-phase RTO handshaking is found to be preferable to the 4-phase RTZ handshaking for achieving enhanced optimizations of the design metrics.Comment: arXiv admin note: text overlap with arXiv:1903.0943

    Area/latency optimized early output asynchronous full adders and relative-timed ripple carry adders

    Get PDF
    This article presents two area/latency optimized gate level asynchronous full adder designs which correspond to early output logic. The proposed full adders are constructed using the delay-insensitive dual-rail code and adhere to the four-phase return-to-zero handshaking. For an asynchronous ripple carry adder (RCA) constructed using the proposed early output full adders, the relative-timing assumption becomes necessary and the inherent advantages of the relative-timed RCA are: (1) computation with valid inputs, i.e., forward latency is data-dependent, and (2) computation with spacer inputs involves a bare minimum constant reverse latency of just one full adder delay, thus resulting in the optimal cycle time. With respect to different 32-bit RCA implementations, and in comparison with the optimized strong-indication, weak-indication, and early output full adder designs, one of the proposed early output full adders achieves respective reductions in latency by 67.8, 12.3 and 6.1 %, while the other proposed early output full adder achieves corresponding reductions in area by 32.6, 24.6 and 6.9 %, with practically no power penalty. Further, the proposed early output full adders based asynchronous RCAs enable minimum reductions in cycle time by 83.4, 15, and 8.8 % when considering carry-propagation over the entire RCA width of 32-bits, and maximum reductions in cycle time by 97.5, 27.4, and 22.4 % for the consideration of a typical carry chain length of 4 full adder stages, when compared to the least of the cycle time estimates of various strong-indication, weak-indication, and early output asynchronous RCAs of similar size. All the asynchronous full adders and RCAs were realized using standard cells in a semi-custom design fashion based on a 32/28 nm CMOS process technology

    Asynchronous Circuit Stacking for Simplified Power Management

    Get PDF
    As digital integrated circuits (ICs) continue to increase in complexity, new challenges arise for designers. Complex ICs are often designed by incorporating multiple power domains therefore requiring multiple voltage converters to produce the corresponding supply voltages. These converters not only take substantial on-chip layout area and/or off-chip space, but also aggregate the power loss during the voltage conversions that must occur fast enough to maintain the necessary power supplies. This dissertation work presents an asynchronous Multi-Threshold NULL Convention Logic (MTNCL) “stacked” circuit architecture that alleviates this problem by reducing the number of voltage converters needed to supply the voltage the ICs operate at. By stacking multiple MTNCL circuits between power and ground, supplying a multiple of VDD to the entire stack and incorporating simple control mechanisms, the dynamic range fluctuation problem can be mitigated. A 130nm Bulk CMOS process and a 32nm Silicon-on-Insulator (SOI) CMOS process are used to evaluate the theoretical effect of stacking different circuitry while running different workloads. Post parasitic physical implementations are then carried out in the 32nm SOI process for demonstrating the feasibility and analyzing the advantages of the proposed MTNCL stacking architecture

    Testing of Asynchronous NULL Conventional Logic (NCL) Circuits

    Get PDF
    Due to the absence of a global clock and presence of more state holding elements that synchronize the control and data paths, conventional automatic test pattern generation (ATPG) algorithms would fail when applied to asynchronous circuits, leading to poor fault coverage. This paper focuses on design for test (DFT) techniques aimed at making asynchronous NCL designs testable using existing DFT CAD tools with reasonable gate overhead, by enhancing controllability of feedback nets and observability for fault sites that are flagged unobservable. The proposed approach performs scan and test points insertion on NCL designs using custom ATPG library. The approach has been automated, which is essential for large systems; and are fully compatible with industry standard tools

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure
    corecore