31 research outputs found

    Asynchronous techniques for system-on-chip design

    Get PDF
    SoC design will require asynchronous techniques as the large parameter variations across the chip will make it impossible to control delays in clock networks and other global signals efficiently. Initially, SoCs will be globally asynchronous and locally synchronous (GALS). But the complexity of the numerous asynchronous/synchronous interfaces required in a GALS will eventually lead to entirely asynchronous solutions. This paper introduces the main design principles, methods, and building blocks for asynchronous VLSI systems, with an emphasis on communication and synchronization. Asynchronous circuits with the only delay assumption of isochronic forks are called quasi-delay-insensitive (QDI). QDI is used in the paper as the basis for asynchronous logic. The paper discusses asynchronous handshake protocols for communication and the notion of validity/neutrality tests, and completion tree. Basic building blocks for sequencing, storage, function evaluation, and buses are described, and two alternative methods for the implementation of an arbitrary computation are explained. Issues of arbitration, and synchronization play an important role in complex distributed systems and especially in GALS. The two main asynchronous/synchronous interfaces needed in GALS-one based on synchronizer, the other on stoppable clock-are described and analyzed

    Autonomous Probabilistic Coprocessing with Petaflips per Second

    Full text link
    In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric −- flips per second −- as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.Comment: 13 pages, 8 figures, 1 tabl

    Design And Synthesis Of Clockless Pipelines Based On Self-resetting Stage Logic

    Get PDF
    For decades, digital design has been primarily dominated by clocked circuits. With larger scales of integration made possible by improved semiconductor manufacturing techniques, relying on a clock signal to orchestrate logic operations across an entire chip became increasingly difficult. Motivated by this problem, designers are currently considering circuits which can operate without a clock. However, the wide acceptance of these circuits by the digital design community requires two ingredients: (i) a unified design methodology supported by widely available CAD tools, and (ii) a granularity of design techniques suitable for synthesizing large designs. Currently, there is no unified established design methodology to support the design and verification of these circuits. Moreover, the majority of clockless design techniques is conceived at circuit level, and is subsequently so fine-grain, that their application to large designs can have unacceptable area costs. Given these considerations, this dissertation presents a new clockless technique, called self-resetting stage logic (SRSL), in which the computation of a block is reset periodically from within the block itself. SRSL is used as a building block for three coarse-grain pipelining techniques: (i) Stage-controlled self-resetting stage logic (S-SRSL) Pipelines: In these pipelines, the control of the communication between stages is performed locally between each pair of stages. This communication is performed in a uni-directional manner in order to simplify its implementation. (ii) Pipeline-controlled self-resetting stage logic (P-SRSL) Pipelines: In these pipelines, the communication between each pair of stages in the pipeline is driven by the oscillation of the last pipeline stage. Their communication scheme is identical to the one used in S-SRSL pipelines. (iii) Delay-tolerant self-resetting stage logic (D-SRSL) Pipelines: While communication in these pipelines is local in nature in a manner similar to the one used in S-SRL pipelines, this communication is nevertheless extended in both directions. The result of this bi-directional approach is an increase in the capability of the pipeline to handle stages with random delay. Based on these pipelining techniques, a new design methodology is proposed to synthesize clockless designs. The synthesis problem consists of synthesizing an SRSL pipeline from a gate netlist with a minimum area overhead given a specified data rate. A two-phase heuristic algorithm is proposed to solve this problem. The goal of the algorithm is to pipeline a given datapath by minimizing the area occupied by inter-stage latches without violating any timing constraints. Experiments with this synthesis algorithm show that while P-SRSL pipelines can reach high throughputs in shallow pipelines, D-SRSL pipelines can achieve comparable throughputs in deeper pipelines

    Scalable Emulation of Sign-Problem−-Free Hamiltonians with Room Temperature p-bits

    Full text link
    The growing field of quantum computing is based on the concept of a q-bit which is a delicate superposition of 0 and 1, requiring cryogenic temperatures for its physical realization along with challenging coherent coupling techniques for entangling them. By contrast, a probabilistic bit or a p-bit is a robust classical entity that fluctuates between 0 and 1, and can be implemented at room temperature using present-day technology. Here, we show that a probabilistic coprocessor built out of room temperature p-bits can be used to accelerate simulations of a special class of quantum many-body systems that are sign-problem−-free or stoquastic, leveraging the well-known Suzuki-Trotter decomposition that maps a dd-dimensional quantum many body Hamiltonian to a dd+1-dimensional classical Hamiltonian. This mapping allows an efficient emulation of a quantum system by classical computers and is commonly used in software to perform Quantum Monte Carlo (QMC) algorithms. By contrast, we show that a compact, embedded MTJ-based coprocessor can serve as a highly efficient hardware-accelerator for such QMC algorithms providing several orders of magnitude improvement in speed compared to optimized CPU implementations. Using realistic device-level SPICE simulations we demonstrate that the correct quantum correlations can be obtained using a classical p-circuit built with existing technology and operating at room temperature. The proposed coprocessor can serve as a tool to study stoquastic quantum many-body systems, overcoming challenges associated with physical quantum annealers.Comment: Fixed minor typos and expanded Appendi

    Resource-Constrained Acquisition Circuits for Next Generation Neural Interfaces

    Get PDF
    The development of neural interfaces allowing the acquisition of signals from the cortex of the brain has seen an increasing amount of interest both in academic research as well as in the commercial space due to their ability to aid people with various medical conditions, such as spinal cord injuries, as well as their potential to allow more seamless interactions between people and machines. While it has already been demonstrated that neural implants can allow tetraplegic patients to control robotic arms, thus to an extent returning some motoric function, the current state of the art often involves the use of heavy table-top instruments connected by wires passing through the patient’s skull, thus making the applications impractical and chronically infeasible. Those limitations are leading to the development of the next generation of neural interfaces that will overcome those issues by being minimal in size and completely wireless, thus paving a way to the possibility of their chronic application. Their development however faces several challenges in numerous aspects of engineering due to constraints presented by their minimal size, amount of power available as well as the materials that can be utilised. The aim of this work is to explore some of those challenges and investigate novel circuit techniques that would allow the implementation of acquisition analogue front-ends under the presented constraints. This is facilitated by first giving an overview of the problematic of recording electrodes and their electrical characterisation in terms of their impedance profile and added noise that can be used to guide the design of analogue front-ends. Continuous time (CT) acquisition is then investigated as a promising signal digitisation technique alternative to more conventional methods in terms of its suitability. This is complemented by a description of practical implementations of a CT analogue-to-digital converter (ADC) including a novel technique of clockless stochastic chopping aimed at the suppression of flicker noise that commonly affects the acquisition of low-frequency signals. A compact design is presented, implementing a 450 nW, 5.5 bit ENOB CT ADC, occupying an area of 0.0288 mm2 in a 0.18 μm CMOS technology, making this the smallest presented design in literature to the best of our knowledge. As completely wireless neural implants rely on power delivered through wireless links, their supply voltage is often subject to large high frequency variations as well voltage uncertainty making it necessary to design reference circuits and voltage regulators providing stable reference voltage and supply in the constrained space afforded to them. This results in numerous challenges that are explored and a design of a practical implementation of a reference circuit and voltage regulator is presented. Two designs in a 0.35 μm CMOS technology are presented, showing respectively a measured PSRR of ≈60 dB and ≈53 dB at DC and a worst-case PSRR of ≈42 dB and ≈33 dB with a less than 1% standard deviation in the output reference voltage of 1.2 V while consuming a power of ≈7 μW. Finally, ΣΔ modulators are investigated for their suitability in neural signal acquisition chains, their properties explained and a practical implementation of a ΣΔ DC-coupled neural acquisition circuit presented. This implements a 10-kHz, 40 dB SNDR ΣΔ analogue front-end implemented in a 0.18 μm CMOS technology occupying a compact area of 0.044 μm2 per channel while consuming 31.1 μW per channel.Open Acces

    Asynchronous Techniques for System-on-Chip Design

    Full text link

    Dynamic Assertion-Based Verification for SystemC

    Get PDF
    SystemC has emerged as a de facto standard modeling language for hardware and embedded systems. However, the current standard does not provide support for temporal specifications. Specifically, SystemC lacks a mechanism for sampling the state of the model at different types of temporal resolutions, for observing the internal state of modules, and for integrating monitors efficiently into the model's execution. This work presents a novel framework for specifying and efficiently monitoring temporal assertions of SystemC models that removes these restrictions. This work introduces new specification language primitives that (1) expose the inner state of the SystemC kernel in a principled way, (2) allow for very fine control over the temporal resolution, and (3) allow sampling at arbitrary locations in the user code. An efficient modular monitoring framework presented here allows the integration of monitors into the execution of the model, while at the same time incurring low overhead and allowing for easy adoption. Instrumentation of the user code is automated using Aspect-Oriented Programming techniques, thereby allowing the integration of user-code-level sample points into the monitoring framework. While most related approaches optimize the size of the monitors, this work focuses on minimizing the runtime overhead of the monitors. Different encoding configurations are identified and evaluated empirically using monitors synthesized from a large benchmark of random and pattern temporal specifications. The framework and approaches described in this dissertation allow the adoption of assertion-based verification for SystemC models written using various levels of abstraction, from system level to register-transfer level. An advantage of this work is that many existing specification languages call be adopted to use the specification primitives described here, and the framework can easily be integrated into existing implementations of SystemC

    2-wire time independent asynchronous communications

    Get PDF
    Communications both to and between low end microprocessors represents a real cost in a number of industrial and consumer products. This thesis starts by examining the properties of protocols that help to minimize these expenses and comes to the conclusion that the derived set of properties define a new category of communications protocol : Time Independent Asynchronous ( TIA) communications. To show the utility of the TIA category we develop a novel TIA protocol that uses only 2-wires and general IO pins on each host. The protocol is analyzed using the Petri net based STG ( Signal Transition Graph) which is widely use to model asynchronous logic. It is shown that STGs do not accurately model the behavior of software driven systems and so a modified form called STG-FT ( STG For Threads) is developed to better model software systems. A simulator is created to take an STG-FT model and perform a full reachability tree analysis to prove correctness and analyze livelock and deadlock properties. The simulator can also examine the full reachability tree for every possible system state ( the cross product of all sub-system states), and analyze deadlock and livelock issues related to unexpected inputs and unusual situations. Reachability pruning algorithms are developed which decrease the search tree by a factor of approximately 250 million. The 2-wire protocol is implemented between a PC and an Atmel Tiny26 microprocessor, there is also a variant that works between microprocessors. Testing verifies the simulation results including an avoidable livelock condition with data throughput peaking at a useful 50 kilobits/second in both directions. The first practical application of 2-wire TIA is part of a novel debugger for the Atmel Tiny26 microprocessor. The approach can be extended to any microprocessor with general IO pins. TIA communications, developed in this thesis, is a serious contender whenever low end microprocessors must communicate with other processors. Consumer and industrial products may be able to achieve cost saving by using this new protocol
    corecore