9 research outputs found
Design and Analysis of Metastable-Hardened, High-Performance, Low-Power Flip-Flops
With rapid technology scaling, flip-flops are becoming more susceptible to metastability due to tighter timing budgets and the more prominent effects of process, temperature, and voltage variation that can result in frequent setup and hold time violations. This thesis presents a detailed methodology and analysis on the design of metastable-hardened, high-performance, and low-power flip-flops.
The design of metastable-hardened flip-flops is focused on optimizing the value of τ mainly due to its exponential relationship with the metastability window δ and the mean-time-between-failure (MTBF). Through small-signal modeling, τ is determined to be a function of the load capacitance and the transconductance in the cross-coupled inverter pair for a given flip-flop architecture. In most cases, the reduction of τ comes at the expense of increased delay and power. Hence, two new design metrics, the metastability-delay-product (MDP) and the metastability-power-delay-product (MPDP), are proposed to analyze the tradeoffs between delay, power and τ. Post-layout simulation results have shown that the proposed optimum MPDP design can reduce the metastability window δ by at least an order of magnitude depending on the value of the settling time and the flip-flop architecture.
In this work, we have proposed two new flip-flop designs: the pre-discharge flip-flop (PDFF) and the sense-amplifier-transmission-gate (SATG) based flip-flop.
Both flip-flop architectures facilitate the usage in both single and dual-supply systems as reduced clock-swing flip-flop and level-converting flip-flop. With a cross-coupled inverter in the master-stage that increases the overall transconductance and a small load transistor associated with the critical node, the architecture of both the PDFF and the SATG is very attractive for the design of metastable-hardened, high-performance, and low-power flip-flops. The amount of overhead in delay, power, and area is all less than 10% under the optimum MPDP design scheme when compared to the traditional optimum PDP design.
In designing for metastable-hardened and soft-error tolerant flip-flops, the main methodology is to improve the metastability performance in the master-stage while applying the soft-error tolerant cell in the slave-stage for protection against soft-error. The proposed flip-flops, PDFF-SE and SATG-SE, both utilize a cross-coupled inverter on the critical path in the master-stage and generate the required differential signals to facilitate the usage of the Quatro soft-error tolerant cell in the slave-stage
Recommended from our members
Network-on-Chip Synchronization
Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency.
First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed.
To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers
An Energy-Efficient System with Timing-Reliable Error-Detection Sequentials
A new type of energy-efficient digital system that integrate EDS and DVS circuits has been developed. In these systems, EDS-monitored paths convert the PVT variations into timing variations. Nevertheless, the conversion can suffer from the reliability issue (extrinsic EDS-reliability). EDS circuits detect the unfavorable timing variations (so called ``error'') and guide DVS circuits to adjust the operating voltage to a proper lower level to save the energy. However, the error detection is generally susceptible to the metastability problem (intrinsic EDS-reliability) due to the synchronizer in EDS circuits. The MTBF due to metastability is exponentially related to the synchronizer delay.
This dissertation proposes a new EDS circuit deployment strategy to enhance the extrinsic EDS-reliability. This strategy requires neither buffer insertion nor an extra clock and is applicable for FPGA implementations. An FPGA-based Discrete Cosine Transform with EDS and DVS circuits deployed in this fashion demonstrates up to 16.5\% energy savings over a conventional design at equivalent frequency setting and image quality, with a 0.8\% logic element and 3.5\% maximum frequency penalties.
VBSs are proposed to improve the synchronizer delay under single low-voltage supply environments. A VBS consists of a Jamb latch and a switched-capacitor-based charge pump that provides a voltage boost to the Jamb Latch to speed up the metastability resolution. The charge pump can be either CVBS or MVBS. A new methodology for extracting the metastability parameters of synchronizers under changing biasing currents is proposed. For a 1-year MTBF specification, MVBS and CVBS show 2.0 to 2.7 and 5.1 to 9.8 times the delay improvement over the basic Jamb latch, respectively, without large power consumption. Optimization techniques including transistor sizing, FBB and dynamic implementation are further applied. For a common MTBF specification at typical PVT conditions, the optimized MVBS and CVBS show 2.97 to 7.57 and 4.14 to 8.13 times the delay improvement over the basic Jamb latch, respectively. In post-Layout simulations, MVBS and CVBS are 1.84 and 2.63 times faster than the basic Jamb latch, respectively
Low-Power and Error-Resilient VLSI Circuits and Systems.
Efficient low-power operation is critically important for the success of the next-generation signal processing applications. Device and supply voltage have been continuously scaled to meet a more constrained power envelope, but scaling has created resiliency challenges, including increasing timing faults and soft errors. Our research aims at designing low-power and robust circuits and systems for signal processing by drawing circuit, architecture, and algorithm approaches.
To gain an insight into the system faults due to supply voltage reduction, we researched the two primary effects that determine the minimum supply voltage (VMIN) in Intel’s tri-gate CMOS technology, namely process variations and gate-dielectric soft breakdown. We determined that voltage scaling increases the timing window that sequential circuits are vulnerable. Thus, we proposed a new hold-time violation metric to define hold-time VMIN, which has been adopted as a new design standard.
Device scaling increases soft errors which affect circuit reliability. Through extensive soft error characterization using two 65nm CMOS test chips, we studied the soft error mechanisms and its dependence on supply voltage and clock frequency. This study laid the foundation of the first 65nm DSP chip design for a NASA spaceflight project. To mitigate such random errors, we proposed a new confidence-driven architecture that effectively enhances the error resiliency of deeply scaled CMOS and post-CMOS circuits.
Designing low-power resilient systems can effectively leverage application-specific algorithmic approaches. To explore design opportunities in the algorithmic domain, we demonstrate an application-specific detection and decoding processor for multiple-input multiple-output (MIMO) wireless communication. To enhance the receive error rate for a robust wireless communication, we designed a joint detection and decoding technique by enclosing detection and decoding in an iterative loop to enhance both interference cancellation and error reduction. A proof-of-concept chip design was fabricated for the next-generation 4x4 256QAM MIMO systems. Through algorithm-architecture optimizations and low-power circuit techniques, our design achieves significant improvements in throughput, energy efficiency and error rate, paving the way for future developments in this area.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110323/1/uchchen_1.pd
Design and Validation of Network-on-Chip Architectures for the Next Generation of Multi-synchronous, Reliable, and Reconfigurable Embedded Systems
NETWORK-ON-CHIP (NoC) design is today at a crossroad. On one hand, the
design principles to efficiently implement interconnection networks in the
resource-constrained on-chip setting have stabilized. On the other hand,
the requirements on embedded system design are far from stabilizing. Embedded
systems are composed by assembling together heterogeneous components featuring
differentiated operating speeds and ad-hoc counter measures must be adopted
to bridge frequency domains. Moreover, an unmistakable trend toward enhanced
reconfigurability is clearly underway due to the increasing complexity of applications.
At the same time, the technology effect is manyfold since it provides unprecedented
levels of system integration but it also brings new severe constraints
to the forefront: power budget restrictions, overheating concerns, circuit delay and
power variability, permanent fault, increased probability of transient faults.
Supporting different degrees of reconfigurability and flexibility in the parallel
hardware platform cannot be however achieved with the incremental evolution of
current design techniques, but requires a disruptive approach and a major increase
in complexity. In addition, new reliability challenges cannot be solved by using
traditional fault tolerance techniques alone but the reliability approach must be
also part of the overall reconfiguration methodology.
In this thesis we take on the challenge of engineering a NoC architectures for
the next generation systems and we provide design methods able to overcome the
conventional way of implementing multi-synchronous, reliable and reconfigurable
NoC. Our analysis is not only limited to research novel approaches to the specific
challenges of the NoC architecture but we also co-design the solutions in a single
integrated framework. Interdependencies between different NoC features are
detected ahead of time and we finally avoid the engineering of highly optimized solutions
to specific problems that however coexist inefficiently together in the final
NoC architecture. To conclude, a silicon implementation by means of a testchip
tape-out and a prototype on a FPGA board validate the feasibility and effectivenes
Design of CMOS Digital Silicon Photomultipliers with ToF for Positron Emission Tomography
This thesis presents a contribution to the design of single-photon detectors for
medical imaging. Specifically, the focus has been on the development of a pixel
capable of single-photon counting in CMOS technology, and the associated
sensor thereof. These sensors can work under low light conditions and provide
timing information to determine the time-stamp of the incoming photons.
For instance, this is particularly attractive for applications that rely either on
time-of-flight measurements or on exponential decay determination of the light
source, like positron emission tomography or fluorescence-lifetime imaging,
respectively. This thesis proposes the study of the pixel architecture to optimize
its performance in terms of sensitivity, linearity and signal to noise ratio.
The design of the pixel has followed a bottom-up approach, taking care of
the smallest building block and studying how the different architecture choices
affect performance. Among the various building blocks needed, special emphasis has been placed on the following:
• the Single-Photon Avalanche Diode (SPAD), a photodiode able to detect
photons one by one;
• the front-end circuitry of this diode, commonly called quenching and
recharge circuit;
• the Time-to-Digital Converter (TDC), which determines the timing performance of the pixel.
The proposed architectural exploration provides a comprehensive insight
into the design space of the pixel, allowing to determine the optimum design
points in terms of sensor sensitivity, linearity or signal to noise ratio, thus helping designers to navigate through non-straightforward trade-offs.
The proposed TDC is based on a voltage-controlled ring oscillator, since this
architecture provides moderate time resolutions while keeping the footprint,
the power, and conversion time relatively small. Two pseudo-differential delay
stages have been studied, one with cross-coupled PMOS transistors and the
other with cross-coupled inverters. Analytical studies and simulations have
shown that cross-coupled inverters are the most appropriate to implement
the TDC because they achieve better time resolution with smaller energy per
conversion than cross-coupled PMOS transistor stages.
A 1.3×1.3 mm2 pixel has been implemented in an 110 nm CMOS image sensor technology, to have the benefits of sub-micron technologies along with the
cleanliness of CMOS image sensor technologies. The fabricated chips have been
used to characterize the single-photon avalanche diodes. The results agree with
expectations: a maximum photon detection probability of 46 % and a median
dark count rate of 0.4 Hz/µm2 with an excess voltage of 3 V. Furthermore, the
characterization of the TDC shows that the time resolution is below 100 ps,
which agrees with post-layout simulations. The differential non-linearity is
±0.4LSB, and the integral non-linearity is ±6.1LSB.
Photoemission occurs during characterization - an indication that the avalanches are not quenched properly. The cause of this has been identified to be
in the design of the SPAD and the quenching circuit. SPADs are sensitive devices which maximum reverse current must be well defined and limited by the
quenching circuit, otherwise unwanted effects like excessive cross-talk, noise,
and power consumption may happen. Although this issue limits the operation
of the implemented pixel, the information obtained during the characterization
will help to avoid mistakes in future implementations
Nano-intrinsic security primitives for internet of everything
With the advent of Internet-enabled electronic devices and mobile computer systems, maintaining data security is one of the most important challenges in modern civilization. The innovation of physically unclonable functions (PUFs) shows great potential for enabling low-cost low-power authentication, anti-counterfeiting and beyond on the semiconductor chips. This is because secrets in a PUF are hidden in the randomness of the physical properties of desirably identical devices, making it extremely difficult, if not impossible, to extract them. Hence, the basic idea of PUF is to take advantage of inevitable non-idealities in the physical domain to create a system that can provide an innovative way to secure device identities, sensitive information, and their communications. While the physical variation exists everywhere, various materials, systems, and technologies have been considered as the source of unpredictable physical device variation in large scales for generating security primitives. The purpose of this project is to develop emerging solid-state memory-based security primitives and examine their robustness as well as feasibility. Firstly, the author gives an extensive overview of PUFs. The rationality, classification, and application of PUF are discussed. To objectively compare the quality of PUFs, the author formulates important PUF properties and evaluation metrics. By reviewing previously proposed constructions ranging from conventional standard complementary metal-oxide-semiconductor (CMOS) components to emerging non-volatile memories, the quality of different PUFs classes are discussed and summarized. Through a comparative analysis, emerging non-volatile redox-based resistor memories (ReRAMs) have shown the potential as promising candidates for the next generation of low-cost, low-power, compact in size, and secure PUF. Next, the author presents novel approaches to build a PUF by utilizing concatenated two layers of ReRAM crossbar arrays. Upon concatenate two layers, the nonlinear structure is introduced, and this results in the improved uniformity and the avalanche characteristic of the proposed PUF. A group of cell readout method is employed, and it supports a massive pool of challenge-response pairs of the nonlinear ReRAM-based PUF. The non-linear PUF construction is experimentally assessed using the evaluation metrics, and the quality of randomness is verified using predictive analysis. Last but not least, random telegraph noise (RTN) is studied as a source of entropy for a true random number generation (TRNG). RTN is usually considered a disadvantageous feature in the conventional CMOS designs. However, in combination with appropriate readout scheme, RTN in ReRAM can be used as a novel technique to generate quality random numbers. The proposed differential readout-based design can maintain the quality of output by reducing the effect of the undesired noise from the whole system, while the controlling difficulty of the conventional readout method can be significantly reduced. This is advantageous as the differential readout circuit can embrace the resistance variation features of ReRAMs without extensive pre-calibration. The study in this thesis has the potential to enable the development of cost-efficient and lightweight security primitives that can be integrated into modern computer mobile systems and devices for providing a high level of security
New Views for Stochastic Computing: From Time-Encoding to Deterministic Processing
University of Minnesota Ph.D. dissertation.July 2018. Major: Electrical/Computer Engineering. Advisor: David Lilja. 1 computer file (PDF); xi, 149 pages.Stochastic computing (SC), a paradigm first introduced in the 1960s, has received considerable attention in recent years as a potential paradigm for emerging technologies and ''post-CMOS'' computing. Logical computation is performed on random bitstreams where the signal value is encoded by the probability of obtaining a one versus a zero. This unconventional representation of data offers some intriguing advantages over conventional weighted binary. Implementing complex functions with simple hardware (e.g., multiplication using a single AND gate), tolerating soft errors (i.e., bit flips), and progressive precision are the primary advantages of SC. The obvious disadvantage, however, is latency. A stochastic representation is exponentially longer than conventional binary radix. Long latencies translate into high energy consumption, often higher than that of their binary counterpart. Generating bit streams is also costly. Factoring in the cost of the bit-stream generators, the overall hardware cost of an SC implementation is often comparable to a conventional binary implementation. This dissertation begins by proposing a highly unorthodox idea: performing computation with digital constructs on time-encoded analog signals. We introduce a new, energy-efficient, high-performance, and much less costly approach for SC using time-encoded pulse signals. We explore the design and implementation of arithmetic operations on time-encoded data and discuss the advantages, challenges, and potential applications. Experimental results on image processing applications show up to 99% performance speedup, 98% saving in energy dissipation, and 40% area reduction compared to prior stochastic implementations. We further introduce a low-cost approach for synthesizing sorting network circuits based on deterministic unary bit-streams. Synthesis results show more than 90% area and power savings compared to the costs of the conventional binary implementation. Time-based encoding of data is then exploited for fast and energy-efficient processing of data with the developed sorting circuits. Poor progressive precision is the main challenge with the recently developed deterministic methods of SC. We propose a high-quality down-sampling method which significantly improves the processing time and the energy consumption of these deterministic methods by pseudo-randomizing bitstreams. We also propose two novel deterministic methods of processing bitstreams by using low-discrepancy sequences. We further introduce a new advantage to SC paradigm-the skew tolerance of SC circuits. We exploit this advantage in developing polysynchronous clocking, a design strategy for optimizing the clock distribution network of SC systems. Finally, as the first study of its kind to the best of our knowledge, we rethink the memory system design for SC. We propose a seamless stochastic system, StochMem, which features analog memory to trade the energy and area overhead of data conversion for computation accuracy