58 research outputs found
Design of variation-tolerant synchronizers for multiple clock and voltage domains
PhD ThesisParametric variability increasingly affects the performance of electronic circuits as
the fabrication technology has reached the level of 32nm and beyond. These
parameters may include transistor Process parameters (such as threshold
voltage), supply Voltage and Temperature (PVT), all of which could have a
significant impact on the speed and power consumption of the circuit, particularly
if the variations exceed the design margins. As systems are designed with more
asynchronous protocols, there is a need for highly robust synchronizers and
arbiters. These components are often used as interfaces between communication
links of different timing domains as well as sampling devices for asynchronous
inputs coming from external components. These applications have created a need
for new robust designs of synchronizers and arbiters that can tolerate process,
voltage and temperature variations.
The aim of this study was to investigate how synchronizers and arbiters should be
designed to tolerate parametric variations. All investigations focused mainly on
circuit-level and transistor level designs and were modeled and simulated in the
UMC90nm CMOS technology process. Analog simulations were used to measure
timing parameters and power consumption along with a āMonte Carloā statistical
analysis to account for process variations.
Two main components of synchronizers and arbiters were primarily investigated:
flip-flop and mutual-exclusion element (MUTEX). Both components can violate the
input timing conditions, setup and hold window times, which could cause
metastability inside their bistable elements and possibly end in failures. The
mean-time between failures is an important reliability feature of any synchronizer
delay through the synchronizer.
The MUTEX study focused on the classical circuit, in addition to a number of
tolerance, based on increasing internal gain by adding current sources, reducing
the capacitive loading, boosting the transconductance of the latch, compensating
the existing Miller capacitance, and adding asymmetry to maneuver the metastable
point. The results showed that some circuits had little or almost no improvements,
while five techniques showed significant improvements by reducing Ļ and
maintaining high tolerance.
Three design approaches are proposed to provide variation-tolerant
synchronizers. wagging synchronizer proposed to First, the is significantly
increase reliability over that of the conventional two flip-flop synchronizer. The
robustness of the wagging technique can be enhanced by using robust Ļ latches or
adding one more cycle of synchronization. The second approach is the
Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly
detecting a metastable event and correcting it by enforcing the previously stored
logic value. This technique significantly reduces the resolution time down from
uncertain
synchronization technique is proposed to transfer signals between Multiple-
Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional
level-shifters between the domains or multiple power supplies within each
domain. This interface circuit uses a synchronous set and feedback reset protocol
which provides level-shifting and synchronization of all signals between the
domains, from a wide range of voltage-supplies and clock frequencies.
Overall, synchronizer circuits can tolerate variations to a greater extent by
employing the wagging technique or using a MADAC latch, while MUTEX tolerance
can suffice with small circuit modifications. Communication between MVD/MCD
can be achieved by an asynchronous handshake
without a need for adding level-shifters.The Saudi Arabian Embassy in London,
Umm Al-Qura University, Saudi Arabi
Solutions and application areas of flip-flop metastability
PhD ThesisThe state space of every continuous multi-stable system is bound to contain one or more
metastable regions where the net attraction to the stable states can be infinitely-small.
Flip-flops are among these systems and can take an unbounded amount of time to decide
which logic state to settle to once they become metastable. This problematic behavior is
often prevented by placing the setup and hold time conditions on the flip-flopās input.
However, in applications such as clock domain crossing where these constraints cannot
be placed flip-flops can become metastable and induce catastrophic failures. These
events are fundamentally impossible to prevent but their probability can be significantly
reduced by employing synchronizer circuits. The latter grant flip-flops longer decision
time at the expense of introducing latency in processing the synchronized input.
This thesis presents a collection of research work involving the phenomenon of
flip-flop metastability in digital systems. The main contributions include three novel
solutions for the problem of synchronization. Two of these solutions are speculative
methods that rely on duplicate state machines to pre-compute data-dependent states
ahead of the completion of synchronization. Speculation is a core theme of this thesis
and is investigated in terms of its functional correctness, cost efficacy and fitness for
being automated by electronic design automation tools. It is shown that speculation
can outperform conventional synchronization solutions in practical terms and is a viable
option for future technologies. The third solution attempts to address the problem of
synchronization in the more-specific context of variable supply voltages. Finally, the
thesis also identifies a novel application of metastability as a means of quantifying
intra-chip physical parameters. A digital sensor is proposed based on the sensitivity
of metastable flip-flops to changes in their environmental parameters and is shown to
have better precision while being more compact than conventional digital sensors
Recommended from our members
Network-on-Chip Synchronization
Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of ābrute-forceā synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency.
First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed.
To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers
Design and Analysis of Metastable-Hardened, High-Performance, Low-Power Flip-Flops
With rapid technology scaling, flip-flops are becoming more susceptible to metastability due to tighter timing budgets and the more prominent effects of process, temperature, and voltage variation that can result in frequent setup and hold time violations. This thesis presents a detailed methodology and analysis on the design of metastable-hardened, high-performance, and low-power flip-flops.
The design of metastable-hardened flip-flops is focused on optimizing the value of Ļ mainly due to its exponential relationship with the metastability window Ī“ and the mean-time-between-failure (MTBF). Through small-signal modeling, Ļ is determined to be a function of the load capacitance and the transconductance in the cross-coupled inverter pair for a given flip-flop architecture. In most cases, the reduction of Ļ comes at the expense of increased delay and power. Hence, two new design metrics, the metastability-delay-product (MDP) and the metastability-power-delay-product (MPDP), are proposed to analyze the tradeoffs between delay, power and Ļ. Post-layout simulation results have shown that the proposed optimum MPDP design can reduce the metastability window Ī“ by at least an order of magnitude depending on the value of the settling time and the flip-flop architecture.
In this work, we have proposed two new flip-flop designs: the pre-discharge flip-flop (PDFF) and the sense-amplifier-transmission-gate (SATG) based flip-flop.
Both flip-flop architectures facilitate the usage in both single and dual-supply systems as reduced clock-swing flip-flop and level-converting flip-flop. With a cross-coupled inverter in the master-stage that increases the overall transconductance and a small load transistor associated with the critical node, the architecture of both the PDFF and the SATG is very attractive for the design of metastable-hardened, high-performance, and low-power flip-flops. The amount of overhead in delay, power, and area is all less than 10% under the optimum MPDP design scheme when compared to the traditional optimum PDP design.
In designing for metastable-hardened and soft-error tolerant flip-flops, the main methodology is to improve the metastability performance in the master-stage while applying the soft-error tolerant cell in the slave-stage for protection against soft-error. The proposed flip-flops, PDFF-SE and SATG-SE, both utilize a cross-coupled inverter on the critical path in the master-stage and generate the required differential signals to facilitate the usage of the Quatro soft-error tolerant cell in the slave-stage
An Energy-Efficient System with Timing-Reliable Error-Detection Sequentials
A new type of energy-efficient digital system that integrate EDS and DVS circuits has been developed. In these systems, EDS-monitored paths convert the PVT variations into timing variations. Nevertheless, the conversion can suffer from the reliability issue (extrinsic EDS-reliability). EDS circuits detect the unfavorable timing variations (so called ``error'') and guide DVS circuits to adjust the operating voltage to a proper lower level to save the energy. However, the error detection is generally susceptible to the metastability problem (intrinsic EDS-reliability) due to the synchronizer in EDS circuits. The MTBF due to metastability is exponentially related to the synchronizer delay.
This dissertation proposes a new EDS circuit deployment strategy to enhance the extrinsic EDS-reliability. This strategy requires neither buffer insertion nor an extra clock and is applicable for FPGA implementations. An FPGA-based Discrete Cosine Transform with EDS and DVS circuits deployed in this fashion demonstrates up to 16.5\% energy savings over a conventional design at equivalent frequency setting and image quality, with a 0.8\% logic element and 3.5\% maximum frequency penalties.
VBSs are proposed to improve the synchronizer delay under single low-voltage supply environments. A VBS consists of a Jamb latch and a switched-capacitor-based charge pump that provides a voltage boost to the Jamb Latch to speed up the metastability resolution. The charge pump can be either CVBS or MVBS. A new methodology for extracting the metastability parameters of synchronizers under changing biasing currents is proposed. For a 1-year MTBF specification, MVBS and CVBS show 2.0 to 2.7 and 5.1 to 9.8 times the delay improvement over the basic Jamb latch, respectively, without large power consumption. Optimization techniques including transistor sizing, FBB and dynamic implementation are further applied. For a common MTBF specification at typical PVT conditions, the optimized MVBS and CVBS show 2.97 to 7.57 and 4.14 to 8.13 times the delay improvement over the basic Jamb latch, respectively. In post-Layout simulations, MVBS and CVBS are 1.84 and 2.63 times faster than the basic Jamb latch, respectively
Circuit Techniques for Adaptive and Reliable High Performance Computing.
Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories.
Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning.
Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS.
Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup.
Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd
Integrated Circuit Design for Radiation Sensing and Hardening.
Beyond the 1950s, integrated circuits have been widely used in a number of electronic devices surrounding peopleās lives. In addition to computing electronics, scientific and medical equipment have also been undergone a metamorphosis, especially in radiation related fields where compact and precision radiation detection systems for nuclear power plants, positron emission tomography (PET), and radiation hardened by design (RHBD) circuits for space applications fabricated in advanced manufacturing technologies are exposed to the non-negligible probability of soft errors by radiation impact events. The integrated circuit design for radiation measurement equipment not only leads to numerous advantages on size and power consumption, but also raises many challenges regarding the speed and noise to replace conventional design modalities. This thesis presents solutions to front-end receiver designs for radiation sensors as well as an error detection and correction method to microprocessor designs under the condition of soft error occurrence.
For the first preamplifier design, a novel technique that enhances the bandwidth and suppresses the input current noise by using two inductors is discussed. With the dual-inductor TIA signal processing configuration, one can reduce the fabrication cost, the area overhead, and the power consumption in a fast readout package. The second front-end receiver is a novel detector capacitance compensation technique by using the Miller effect. The fabricated CSA exhibits minimal variation in the pulse shape as the detector capacitance is increased. Lastly, a modified D flip-flop is discussed that is called Razor-Lite using charge-sharing at internal nodes to provide a compact EDAC design for modern well-balanced processors and RHBD against soft errors by SEE.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111548/1/iykwon_1.pd
The Fifth NASA Symposium on VLSI Design
The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
Asynchronous Bypass Channels Improving Performance for Multi-synchronous Network-on-chips
Dr. Paul V. Gratz Network-on-Chip (NoC) designs have emerged as a replacement for traditional shared-bus designs for on-chip communications. As with all current VLSI design, however, reducing power consumption in NoCs is a critical challenge. One approach to reduce power is to dynamically scale the voltage and frequency of each network node or groups of nodes (DVFS). Another approach to reduce power consumption is to replace the balanced clock tree with a globally-asynchronous, locally-synchronous (GALS) clocking scheme. NoCs implemented with either of these schemes, however, tend to have high latencies as packets must be synchronized at the intermediate nodes between source and destination. In this work, we propose a novel router microarchitecture which offers superior performance versus typical synchroniz- ing router designs. Our approach features Asynchronous Bypass Channels (ABCs) at intermediate nodes thus avoiding synchronization delay. We also propose a new network topology and routing algorithm that leverage the advantages of the bypass channel offered by our router design. Our experiments show that our design improves the performance of a conventional synchronizing design with similar resources by up to 26 percent at low loads and increases saturation throughput by up to 50 percent
Reconfigurable time interval measurement circuit incorporating a programmable gain time difference amplifier
PhD ThesisAs further advances are made in semiconductor manufacturing technology the performance of circuits is continuously increasing. Unfortunately, as the technology node descends deeper into the nanometre region, achieving the potential performance gain is becoming more of a challenge; due not only to the effects of process variation but also to the reduced timing margins between signals within the circuit creating timing problems. Production Standard Automatic Test Equipment (ATE) is incapable of performing internal timing measurements due, first to the lack of accessibility and second to the overall timing accuracy of the tester which is grossly inadequate. To address these issue āon-chipā time measurement circuits have been developed in a similar way that built in self-test (BIST) evolved for āon-chipā logic testing.
This thesis describes the design and analysis of three time amplifier circuits. The analysis undertaken considers the operational aspects related to gain and input dynamic range, together with the robustness of the circuits to the effects of process, voltage and temperature (PVT) variations. The design which had the best overall performance was subsequently compared to a benchmark design, which used the ābuffer delay offsetā technique for time amplification, and showed a marked 6.5 times improvement on the dynamic range extending this from 40 ps to 300ps. The new design was also more robust to the effects of PVT variations.
The new time amplifier design was further developed to include an adjustable gain capability which could be varied in steps of approximately 7.5 from 4 to 117. The time amplifier was then connected to a 32-stage tapped delay line to create a reconfigurable time measurement circuit with an adjustable resolution range from 15 down to 0.5 ps and a dynamic range from 480 down to 16 ps depending upon the gain setting. The overall footprint of the measurement circuit, together with its calibration module occupies an area of 0.026 mm2
The final circuit, overall, satisfied the main design criteria for āon-chipā time measurement circuitry, namely, it has a wide dynamic range, high resolution, robust to the effects of PVT and has a small area overhead.Umm Al-Qura University
- ā¦