11 research outputs found

    DPA on quasi delay insensitive asynchronous circuits: formalization and improvement

    Full text link
    The purpose of this paper is to formally specify a flow devoted to the design of Differential Power Analysis (DPA) resistant QDI asynchronous circuits. The paper first proposes a formal modeling of the electrical signature of QDI asynchronous circuits. The DPA is then applied to the formal model in order to identify the source of leakage of this type of circuits. Finally, a complete design flow is specified to minimize the information leakage. The relevancy and efficiency of the approach is demonstrated using the design of an AES crypto-processor.Comment: Submitted on behalf of EDAA (http://www.edaa.com/

    Modeling an Asynchronous Circuit Dedicated to the Protection Against Physical Attacks

    Get PDF
    Asynchronous circuits have several advantages for security applications, in particular their good resistance to attacks. In this paper, we report on experiments with modeling, at various abstraction levels, a patented asynchronous circuit for detecting physical attacks, such as cutting wires or producing short-circuits.Comment: In Proceedings MARS 2020, arXiv:2004.1240

    COMPARATIVE EVALUATION OF QUASI-DELAY-INSENSITIVE ASYNCHRONOUS ADDERS CORRESPONDING TO RETURN-TO-ZERO AND RETURN-TO-ONE HANDSHAKING

    Get PDF
    This article makes a comparative evaluation of quasi-delay-insensitive (QDI) asynchronous adders, realized using the delay-insensitive dual-rail code, which adhere to 4-phase return-to-zero (RTZ) and 4-phase return-to-one (RTO) handshake protocols. The QDI adders realized correspond to the following adder architectures: i) ripple carry adder, ii) carry lookahead adder, and iii) carry select adder. The QDI adders correspond to three different timing regimes viz. strong-indication, weak-indication and early output. They are physically implemented using a 32/28nm CMOS process. The comparative evaluation shows that, overall, QDI adders which correspond to the 4-phase RTO handshake protocol are better than the QDI adder counterparts which correspond to the 4-phase RTZ handshake protocol in terms of latency, area, and average power dissipation

    Circuiti asincroni: dai principi fondamentali all'implementazione

    Get PDF
    La maggioranza dei circuiti commercializzati al giorno d'oggi è di tipo sincrono. Negli ultimi anni però, questa tecnologia si è trovata a dover affrontare notevoli problemi legati al consumo di potenza e alle crescenti difficoltà di gestione del clock, in circuiti sempre più piccoli e densi. Per ovviare a queste problematiche, che richiedono soluzioni tecnicamente complesse e dispendiose, i costruttori stanno portando l'attenzione sull'approccio asincrono che, privo di clock, promette di ridurre i consumi e velocizzare i circuiti. La mancanza di esperienza, strumenti e motivazioni adeguate rende però molto difficile una migrazione totale da un paradigma all'altro. La tecnologia che sembra destinata a prendere piede in questo contesto è quindi l'approccio ibrido Globally Asynchronous, Locally Synchronous. Importanti produttori sono impegnati nella ricerca in questo settore, che è ancora in piena fase evolutiva. Il presente lavoro è diviso in due parti: nella prima offriremo un quadro generale sui fondamenti della tecnologia asincrona e, nella seconda, vedremo esempi di design che rappresentano l'attuale stato dell'arteope

    Null convention logic circuits for asynchronous computer architecture

    Get PDF
    For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
    corecore