35 research outputs found

    Architecture Independent Timing Speculation Techniques in VLSI Circuits.

    Full text link
    Conventional digital circuits must ensure correct operation throughout a wide range of operating conditions including process, voltage, and temperature variation. These conditions have an effect on circuit delays, and safety margins must be put in place which come at a power and performance cost. The Razor system proposed eliminating these timing margins by running a circuit with occasional timing errors and correcting the errors when they occur. Several existing Razor style designs have been proposed, however prior to this work, Razor could not be applied blindly or automatically to designs, as the various error correction schemes modified the architecture of the target design. Because of the architectural invasiveness and design complexities of these techniques, no published Razor style system had been applied to a complete existing commercial processor. Additionally, in all prior Razor-style systems, there is a fundamental tradeoff between speculation window and short path, or minimum delay, constraints, limiting the technique鈥檚 effectiveness. This thesis introduces the concept of Razor using two-phase latch based timing. By identifying and utilizing time borrowing as an error correction mechanism, it allows for Razor to be applied without the need to reload data or replay instructions. This allows for Razor to be blindly and automatically applied to existing designs without detailed knowledge of internal architecture. Additionally, latch based Razor allows for large speculation windows, up to 100% of nominal circuit delay, because it breaks the connection between minimum delay constraints and speculation window. By demonstrating how to transform conventional flip-flop based designs, including those which make use of clock gating, to two-phase latch based timing, Razor can be automatically added to a large set of existing digital designs. Two forms of latch based Razor are proposed. First, Bubble Razor involves rippling stall cycles throughout a circuit in response to timing errors and is applied to the ARM Cortex-M3 processor, the first ever application of a Razor technique to a complete, existing processor design. Additional work applies Bubble Razor to the ARM Cortex-R4 processor. The second latch based Razor technique, Voltage Razor, uses voltage boosting to correct for timing errors.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102461/1/mfojtik_1.pd

    An asynchronous instruction length decoder

    Get PDF
    Journal ArticleAbstract-This paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium庐 Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II庐 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25-CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions per nanosecond-with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400-MHz clocked circuit fabricated on the same process

    An asynchronous instruction length decoder

    Get PDF
    Journal ArticleThis paper describes an investigation of potential advantages and pitfalls of applying an asynchronous design methodology to an advanced microprocessor architecture. A prototype complex instruction set length decoding and steering unit was implemented using self-timed circuits. [The Revolving Asynchronous Pentium庐 Processor Instruction Decoder (RAPPID) design implemented the complete Pentium II庐 32-bit MMX instruction set.] The prototype chip was fabricated on a 0.25-CMOS process and tested successfully. Results show significant advantages-in particular, performance of 2.5-4.5 instructions per nanosecond-with manageable risks using this design technology. The prototype achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as the fastest commercial 400-MHz clocked circuit fabricated on the same process

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    NASA Space Engineering Research Center Symposium on VLSI Design

    Get PDF
    The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers

    Improvement of hardware reliability with aging monitors

    Get PDF

    Automatic test pattern generation for asynchronous circuits

    Get PDF
    The testability of integrated circuits becomes worse with transistor dimensions reaching nanometer scales. Testing, the process of ensuring that circuits are fabricated without defects, becomes inevitably part of the design process; a technique called design for test (DFT). Asynchronous circuits have a number of desirable properties making them suitable for the challenges posed by modern technologies, but are severely limited by the unavailability of EDA tools for DFT and automatic test-pattern generation (ATPG). This thesis is motivated towards developing test generation methodologies for asynchronous circuits. In total four methods were developed which are aimed at two different fault models: stuck-at faults at the basic logic gate level and transistor-level faults. The methods were evaluated using a set of benchmark circuits and compared favorably to previously published work. First, ABALLAST is a partial-scan DFT method adapting the well-known BALLAST technique for asynchronous circuits where balanced structures are used to guide the selection of the state-holding elements that will be scanned. The test inputs are automatically provided by a novel test pattern generator, which uses time frame unrolling to deal with the remaining, non-scanned sequential C-elements. The second method, called AGLOB, uses algorithms from strongly-connected components in graph graph theory as a method for finding the optimal position of breaking the loops in the asynchronous circuit and adding scan registers. The corresponding ATPG method converts cyclic circuits into acyclic for which standard tools can provide test patterns. These patterns are then automatically converted for use in the original cyclic circuits. The third method, ASCP, employs a new cycle enumeration method to find the loops present in a circuit. Enumerated cycles are then processed using an efficient set covering heuristic to select the scan elements for the circuit to be tested.Applying these methods to the benchmark circuits shows an improvement in fault coverage compared to previous work, which, for some circuits, was substantial. As no single method consistently outperforms the others in all benchmarks, they are all valuable as a designer鈥檚 suite of tools for testing. Moreover, since they are all scan-based, they are compatible and thus can be simultaneously used in different parts of a larger circuit. In the final method, ATRANTE, the main motivation of developing ATPG is supplemented by transistor level test generation. It is developed for asynchronous circuits designed using a State Transition Graph (STG) as their specification. The transistor-level circuit faults are efficiently mapped onto faults that modify the original STG. For each potential STG fault, the ATPG tool provides a sequence of test vectors that expose the difference in behavior to the output ports. The fault coverage obtained was 52-72 % higher than the coverage obtained using the gate level tests. Overall, four different design for test (DFT) methods for automatic test pattern generation (ATPG) for asynchronous circuits at both gate and transistor level were introduced in this thesis. A circuit extraction method for representing the asynchronous circuits at a higher level of abstraction was also implemented. Developing new methods for the test generation of asynchronous circuits in this thesis facilitates the test generation for asynchronous designs using the CAD tools available for testing the synchronous designs. Lessons learned and the research questions raised due to this work will impact the future work to probe the possibilities of developing robust CAD tools for testing the future asynchronous designs

    Precise Timing of Digital Signals: Circuits and Applications

    Get PDF
    With the rapid advances in process technologies, the performance of state-of-the-art integrated circuits is improving steadily. The drive for higher performance is accompanied with increased emphasis on meeting timing constraints not only at the design phase but during device operation as well. Fortunately, technology advancements allow for even more precise control of the timing of digital signals, an advantage which can be used to provide solutions that can address some of the emerging timing issues. In this thesis, circuit and architectural techniques for the precise timing of digital signals are explored. These techniques are demonstrated in applications addressing timing issues in modern digital systems. A methodology for slow-speed timing characterization of high-speed pipelined datapaths is proposed. The technique uses a clock-timing circuit to create shifted versions of a slow-speed clock. These clocks control the data flow in the pipeline in the test mode. Test results show that the design provides an average timing resolution of 52.9ps in 0.18渭m CMOS technology. Results also demonstrate the ability of the technique to track the performance of high-speed pipelines at a reduced clock frequency and to test the clock-timing circuit itself. In order to achieve higher resolutions than that of an inverter/buffer stage, a differential (vernier) delay line is commonly used. To allow for the design of differential delay lines with programmable delays, a digitally-controlled delay-element is proposed. The delay element is monotonic and achieves a high degree of transfer characteristics' (digital code vs. delay) linearity. Using the proposed delay element, a sub-1ps resolution is demonstrated experimentally in 0.18渭m CMOS. The proposed delay element with a fixed delay step of 2ps is used to design a high-precision all-digital phase aligner. High-precision phase alignment has many applications in modern digital systems such as high-speed memory controllers, clock-deskew buffers, and delay and phase-locked loops. The design is based on a differential delay line and a variation tolerant phase detector using redundancy. Experimental results show that the phase aligner's range is from -264ps to +247ps which corresponds to an average delay step of approximately 2.43ps. For various input phase difference values, test results show that the difference is reduced to less than 2ps at the output of the phase aligner. On-chip time measurement is another application that requires precise timing. It has applications in modern automatic test equipment and on-chip characterization of jitter and skew. In order to achieve small conversion time, a flash time-to-digital converter is proposed. Mismatch between the various delay comparators limits the time measurement precision. This is demonstrated through an experiment in which a 6-bit, 2.5ps resolution flash time-to-digital converter provides an effective resolution of only 4-bits. The converter achieves a maximum conversion rate of 1.25GSa/s

    Low-cost and efficient fault detection and diagnosis schemes for modern cores

    Get PDF
    Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitect贸nicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad est谩n desafiando la producci贸n de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la l贸gica combinacional: soluciones como ECC o paridad se est谩n volviendo insuficientes a medida que dicha l贸gica se convierte en la fuente predominante de soft errors. Adem谩s, los expertos est谩n alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los m谩s vulnerables, ya que imponen unos requisitos que las t茅cnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecuci贸n) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asum铆an grandes costes. Son por tanto necesarias nuevas t茅cnicas que permitan la protecci贸n contra fallos en los segmentos m谩s populares. La complejidad de los dise帽os est谩 encareciendo la validaci贸n "post-silicon". Su coste excede el de dise帽o, y el n煤mero de errores descubiertos est谩 aumentando durante la validaci贸n y ya en manos de los clientes. La localizaci贸n y el diagn贸stico de errores son los mayores problemas, empeorados por las altas latencias en la manifestaci贸n de errores, por la poca observabilidad interna y por el coste de generar las se帽ales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validaci贸n. Primero exploramos mecanismos para detectar m煤ltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de dise帽o). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitect贸nicos gen茅ricos, en lugar de basarse en re-ejecuci贸n o en protecci贸n ad-hoc. Para ello descomponemos las funcionalidades b谩sicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detecci贸n de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computaci贸n, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras t茅cnicas usan el concepto de firmas y permiten a los dise帽adores ajustar los niveles de protecci贸n a sus necesidades, mediante compensaciones en 谩rea, consumo energ茅tico y rendimiento. Nuestros estudios de inyecci贸n de errores revelan que los m茅todos propuestos obtienen altos niveles de protecci贸n, a la vez que causan menos costes que las soluciones existentes. A continuaci贸n, esta tesis explora la aplicabilidad de estos esquemas a las fases de validaci贸n. Proponemos una soluci贸n de localizaci贸n y diagn贸stico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detecci贸n de errores, junto con un mecanismo de logging de bajo coste y un programa de diagn贸stico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validaci贸n. La soluci贸n permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagn贸stico analiza el registro para autom谩ticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localizaci贸n y de precisi贸n de diagn贸stico a un coste muy bajo de rendimiento y 谩rea. El resultado es una simplificaci贸n de las pr谩cticas actuales de depuraci贸n, que son enormemente manuales, inc贸modas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores m谩s fiables, a medida que la tecnolog铆a evoluciona hacia dise帽os m谩s complejos y m谩s vulnerables
    corecore