22 research outputs found

    Test and Testability of Asynchronous Circuits

    Full text link
    The ever-increasing transistor shrinkage and higher clock frequencies are causing serious clock distribution, power management, and reliability issues. Asynchronous design is predicted to have a significant role in tackling these challenges because of its distributed control mechanism and on-demand, rather than continuous, switching activity. Null Convention Logic (NCL) is a robust and low-power asynchronous paradigm that introduces new challenges to test and testability algorithms because 1) the lack of deterministic timing in NCL complicates the management of test timing, 2) all NCL gates are state-holding and even simple combinational circuits show sequential behaviour, and 3) stuck-at faults on gate internal feedback (GIF) of NCL gates do not always cause an incorrect output and therefore are undetectable by automatic test pattern generation (ATPG) algorithms. Existing test methods for NCL use clocked hardware to control the timing of test. Such test hardware could introduce metastability issues into otherwise highly robust NCL devices. Also, existing test techniques for NCL handle the high-statefulness of NCL circuits by excessive incorporation of test hardware which imposes additional area, propagation delay and power consumption. This work, first, proposes a clockless self-timed ATPG that detects all faults on the gate inputs and a share of the GIF faults with no added design for test (DFT). Then, the efficacy of quiescent current (IDDQ) test for detecting GIF faults undetectable by a DFT-less ATPG is investigated. Finally, asynchronous test hardware, including test points, a scan cell, and an interleaved scan architecture, is proposed for NCL-based circuits. To the extent of our knowledge, this is the first work that develops clockless, self-timed test techniques for NCL while minimising the need for DFT, and also the first work conducted on IDDQ test of NCL. The proposed methods are applied to multiple NCL circuits with up to 2,633 NCL gates (10,000 CMOS Boolean gates), in 180 and 45 nm technologies and show average fault coverage of 88.98% for ATPG alone, 98.52% including IDDQ test, and 99.28% when incorporating test hardware. Given that this fault coverage includes detection of GIF faults, our work has 13% higher fault coverage than previous work. Also, because our proposed clockless test hardware eliminates the need for double-latching, it reduces the average area and delay overhead of previous studies by 32% and 50%, respectively

    Spatial parallelism in the routers of asynchronous on-chip networks

    Get PDF
    State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Spatial parallelism in the routers of asynchronous on-chip networks

    Get PDF
    State-of-the-art multi-processor systems-on-chip use on-chip networks as their communication fabric. Although most on-chip networks are implemented synchronously, asynchronous on-chip networks have several advantages over their synchronous counterparts. Timing division multiplexing (TDM) flow control methods have been utilized in asynchronous on-chip networks extensively. The synchronization required by TDM leads to significant speed penalties. Compared with using TDM methods, spatial parallelism methods, such as the spatial division multiplexing (SDM) flow control method, achieve better network throughput with less area overhead.This thesis proposes several techniques to increase spatial parallelism in the routers of asynchronous on-chip networks.Channel slicing is a new pipeline structure that alleviates the speed penalty by removing the synchronization among bit-level data pipelines. It is also found out that the lookahead pipeline using early evaluated acknowledgement can be used in routers to further improve speed.SDM is a new flow control method proposed for asynchronous on-chip networks. It improves network throughput without introducing synchronization among buffers of different frames, which is required by TDM methods. It is also found that the area overhead of SDM is smaller than the virtual channel (VC) flow control method -- the most used TDM method. The major design problem of SDM is the area consuming crossbars. A novel 2-stage Clos switch structure is proposed to replace the crossbar in SDM routers, which significantly reduces the area overhead. This Clos switch is dynamically reconfigured by a new asynchronous Clos scheduler.Several asynchronous SDM routers are implemented using these new techniques. An asynchronous VC router is also reproduced for comparison. Performance analyses show that the SDM routers outperform the VC router in throughput, area overhead and energy efficiency.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Analysis of asynchronous routers for network-on-chip applications

    Get PDF
    Asynchronous circuit design has been conventionally regarded as a valid alternative to synchronous logic due to its potential for low consumption of resources, power and delay. This includes areas such as the communication infrastructure of modern multi core processors, the so-called Network-on-Chip (NoC) paradigm on which this thesis focus on. In recent times, the transistor downscaling and the increasing clock frequencies have pushed synchronous design to high static power and delay. As a result, the interest for asynchronous integrated routers and links has re-emerged, especially in fields with ultra-low power requirements such as embedded systems. In this thesis, we construct an asynchronous router using Verilog code based on architectures found in the literature. We analyze the functionality of each of the building blocks and verify the operation of the implemented routing algorithm and arbitration mechanism. In the future, the results obtained here are expected to enable a complete implementation of the router in Verilog and its posterior analysis of its scalability

    Methodologies and Toolflows for the Predictable Design of Reliable and Low-Power NoCs

    Get PDF
    There is today the unmistakable need to evolve design methodologies and tool ows for Network-on-Chip based embedded systems. In particular, the quest for low-power requirements is nowadays a more-than-ever urgent dilemma. Modern circuits feature billion of transistors, and neither power management techniques nor batteries capacity are able to endure the increasingly higher integration capability of digital devices. Besides, power concerns come together with modern nanoscale silicon technology design issues. On one hand, system failure rates are expected to increase exponentially at every technology node when integrated circuit wear-out failure mechanisms are not compensated for. However, error detection and/or correction mechanisms have a non-negligible impact on the network power. On the other hand, to meet the stringent time-to-market deadlines, the design cycle of such a distributed and heterogeneous architecture must not be prolonged by unnecessary design iterations. Overall, there is a clear need to better discriminate reliability strategies and interconnect topology solutions upfront, by ranking designs based on power metric. In this thesis, we tackle this challenge by proposing power-aware design technologies. Finally, we take into account the most aggressive and disruptive methodology for embedded systems with ultra-low power constraints, by migrating NoC basic building blocks to asynchronous (or clockless) design style. We deal with this challenge delivering a standard cell design methodology and mainstream CAD tool ows, in this way partially relaxing the requirement of using asynchronous blocks only as hard macros

    A Network-based Asynchronous Architecture for Cryptographic Devices

    Get PDF
    Institute for Computing Systems ArchitectureThe traditional model of cryptography examines the security of the cipher as a mathematical function. However, ciphers that are secure when specified as mathematical functions are not necessarily secure in real-world implementations. The physical implementations of ciphers can be extremely difficult to control and often leak socalled side-channel information. Side-channel cryptanalysis attacks have shown to be especially effective as a practical means for attacking implementations of cryptographic algorithms on simple hardware platforms, such as smart-cards. Adversaries can obtain sensitive information from side-channels, such as the timing of operations, power consumption and electromagnetic emissions. Some of the attack techniques require surprisingly little side-channel information to break some of the best known ciphers. In constrained devices, such as smart-cards, straightforward implementations of cryptographic algorithms can be broken with minimal work. Preventing these attacks has become an active and a challenging area of research. Power analysis is a successful cryptanalytic technique that extracts secret information from cryptographic devices by analysing the power consumed during their operation. A particularly dangerous class of power analysis, differential power analysis (DPA), relies on the correlation of power consumption measurements. It has been proposed that adding non-determinism to the execution of the cryptographic device would reduce the danger of these attacks. It has also been demonstrated that asynchronous logic has advantages for security-sensitive applications. This thesis investigates the security and performance advantages of using a network-based asynchronous architecture, in which the functional units of the datapath form a network. Non-deterministic execution is achieved by exploiting concurrent execution of instructions both with and without data-dependencies; and by forwarding register values between instructions with data-dependencies using randomised routing over the network. The executions of cryptographic algorithms on different architectural configurations are simulated, and the obtained power traces are subjected to DPA attacks. The results show that the proposed architecture introduces a level of non-determinism in the execution that significantly raises the threshold for DPA attacks to succeed. In addition, the performance analysis shows that the improved security does not degrade performance

    Built-In Self-Test (BIST) for Multi-Threshold NULL Convention Logic (MTNCL) Circuits

    Get PDF
    This dissertation proposes a Built-In Self-Test (BIST) hardware implementation for Multi-Threshold NULL Convention Logic (MTNCL) circuits. Two different methods are proposed: an area-optimized topology that requires minimal area overhead, and a test-performance-optimized topology that utilizes parallelism and internal hardware to reduce the overall test time through additional controllability points. Furthermore, an automated software flow is proposed to insert, simulate, and analyze an input MTNCL netlist to obtain a desired fault coverage, if possible, through iterative digital and fault simulations. The proposed automated flow is capable of producing both area-optimized and test-performance-optimized BIST circuits and scripts for digital and fault simulation using commercial software that may be utilized to manually verify or adjust further, if desired

    Doctor of Philosophy

    Get PDF
    dissertationCommunication surpasses computation as the power and performance bottleneck in forthcoming exascale processors. Scaling has made transistors cheap, but on-chip wires have grown more expensive, both in terms of latency as well as energy. Therefore, the need for low energy, high performance interconnects is highly pronounced, especially for long distance communication. In this work, we examine two aspects of the global signaling problem. The first part of the thesis focuses on a high bandwidth asynchronous signaling protocol for long distance communication. Asynchrony among intellectual property (IP) cores on a chip has become necessary in a System on Chip (SoC) environment. Traditional asynchronous handshaking protocol suffers from loss of throughput due to the added latency of sending the acknowledge signal back to the sender. We demonstrate a method that supports end-to-end communication across links with arbitrarily large latency, without limiting the bandwidth, so long as line variation can be reliably controlled. We also evaluate the energy and latency improvements as a result of the design choices made available by this protocol. The use of transmission lines as a physical interconnect medium shows promise for deep submicron technologies. In our evaluations, we notice a lower energy footprint, as well as vastly reduced wire latency for transmission line interconnects. We approach this problem from two sides. Using field solvers, we investigate the physical design choices to determine the optimal way to implement these lines for a given back-end-of-line (BEOL) stack. We also approach the problem from a system designer's viewpoint, looking at ways to optimize the lines for different performance targets. This work analyzes the advantages and pitfalls of implementing asynchronous channel protocols for communication over long distances. Finally, the innovations resulting from this work are applied to a network-on-chip design example and the resulting power-performance benefits are reported
    corecore