Search CORE

33 research outputs found

Testability and redundancy techniques for improved yield and reliability of CMOS VLSI circuits

Author: Bensouiah Djamel Abderrahmane
Publication venue
Publication date: 01/01/1992
Field of study

The research presented in this thesis is concerned with the design of fault-tolerant integrated circuits as a contribution to the design of fault-tolerant systems. The economical manufacture of very large area ICs will necessitate the incorporation of fault-tolerance features which are routinely employed in current high density dynamic random access memories. Furthermore, the growing use of ICs in safety-critical applications and/or hostile environments in addition to the prospect of single-chip systems will mandate the use of fault-tolerance for improved reliability. A fault-tolerant IC must be able to detect and correct all possible faults that may affect its operation. The ability of a chip to detect its own faults is not only necessary for fault-tolerance, but it is also regarded as the ultimate solution to the problem of testing. Off-line periodic testing is selected for this research because it achieves better coverage of physical faults and it requires less extra hardware than on-line error detection techniques. Tests for CMOS stuck-open faults are shown to detect all other faults. Simple test sequence generation procedures for the detection of all faults are derived. The test sequences generated by these procedures produce a trivial output, thereby, greatly simplifying the task of test response analysis. A further advantage of the proposed test generation procedures is that they do not require the enumeration of faults. The implementation of built-in self-test is considered and it is shown that the hardware overhead is comparable to that associated with pseudo-random and pseudo-exhaustive techniques while achieving a much higher fault coverage through-the use of the proposed test generation procedures. The consideration of the problem of testing the test circuitry led to the conclusion that complete test coverage may be achieved if separate chips cooperate in testing each other's untested parts. An alternative approach towards complete test coverage would be to design the test circuitry so that it is as distributed as possible and so that it is tested as it performs its function. Fault correction relies on the provision of spare units and a means of reconfiguring the circuit so that the faulty units are discarded. This raises the question of what is the optimum size of a unit? A mathematical model, linking yield and reliability is therefore developed to answer such a question and also to study the effects of such parameters as the amount of redundancy, the size of the additional circuitry required for testing and reconfiguration, and the effect of periodic testing on reliability. The stringent requirement on the size of the reconfiguration logic is illustrated by the application of the model to a typical example. Another important result concerns the effect of periodic testing on reliability. It is shown that periodic off-line testing can achieve approximately the same level of reliability as on-line testing, even when the time between tests is many hundreds of hours

Durham e-Theses

Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

Author: Roelke George R., IV
Publication venue: AFIT Scholar
Publication date: 01/01/2006
Field of study

This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

CiteSeerX

Low-cost and efficient fault detection and diagnosis schemes for modern cores

Author: Carretero Casado Javier Sebastian
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2015
Field of study

Continuous improvements in transistor scaling together with microarchitectural advances have made possible the widespread adoption of high-performance processors across all market segments. However, the growing reliability threats induced by technology scaling and by the complexity of designs are challenging the production of cheap yet robust systems. Soft error trends are haunting, especially for combinational logic, and parity and ECC codes are therefore becoming insufficient as combinational logic turns into the dominant source of soft errors. Furthermore, experts are warning about the need to also address intermittent and permanent faults during processor runtime, as increasing temperatures and device variations will accelerate inherent aging phenomena. These challenges specially threaten the commodity segments, which impose requirements that existing fault tolerance mechanisms cannot offer. Current techniques based on redundant execution were devised in a time when high penalties were assumed for the sake of high reliability levels. Novel light-weight techniques are therefore needed to enable fault protection in the mass market segments. The complexity of designs is making post-silicon validation extremely expensive. Validation costs exceed design costs, and the number of discovered bugs is growing, both during validation and once products hit the market. Fault localization and diagnosis are the biggest bottlenecks, magnified by huge detection latencies, limited internal observability, and costly server farms to generate test outputs. This thesis explores two directions to address some of the critical challenges introduced by unreliable technologies and by the limitations of current validation approaches. We first explore mechanisms for comprehensively detecting multiple sources of failures in modern processors during their lifetime (including transient, intermittent, permanent and also design bugs). Our solutions embrace a paradigm where fault tolerance is built based on exploiting high-level microarchitectural invariants that are reusable across designs, rather than relying on re-execution or ad-hoc block-level protection. To do so, we decompose the basic functionalities of processors into high-level tasks and propose three novel runtime verification solutions that combined enable global error detection: a computation/register dataflow checker, a memory dataflow checker, and a control flow checker. The techniques use the concept of end-to-end signatures and allow designers to adjust the fault coverage to their needs, by trading-off area, power and performance. Our fault injection studies reveal that our methods provide high coverage levels while causing significantly lower performance, power and area costs than existing techniques. Then, this thesis extends the applicability of the proposed error detection schemes to the validation phases. We present a fault localization and diagnosis solution for the memory dataflow by combining our error detection mechanism, a new low-cost logging mechanism and a diagnosis program. Selected internal activity is continuously traced and kept in a memory-resident log whose capacity can be expanded to suite validation needs. The solution can catch undiscovered bugs, reducing the dependence on simulation farms that compute golden outputs. Upon error detection, the diagnosis algorithm analyzes the log to automatically locate the bug, and also to determine its root cause. Our evaluations show that very high localization coverage and diagnosis accuracy can be obtained at very low performance and area costs. The net result is a simplification of current debugging practices, which are extremely manual, time consuming and cumbersome. Altogether, the integrated solutions proposed in this thesis capacitate the industry to deliver more reliable and correct processors as technology evolves into more complex designs and more vulnerable transistors.El continuo escalado de los transistores junto con los avances microarquitectónicos han posibilitado la presencia de potentes procesadores en todos los segmentos de mercado. Sin embargo, varios problemas de fiabilidad están desafiando la producción de sistemas robustos. Las predicciones de "soft errors" son inquietantes, especialmente para la lógica combinacional: soluciones como ECC o paridad se están volviendo insuficientes a medida que dicha lógica se convierte en la fuente predominante de soft errors. Además, los expertos están alertando acerca de la necesidad de detectar otras fuentes de fallos (causantes de errores permanentes e intermitentes) durante el tiempo de vida de los procesadores. Los segmentos "commodity" son los más vulnerables, ya que imponen unos requisitos que las técnicas actuales de fiabilidad no ofrecen. Estas soluciones (generalmente basadas en re-ejecución) fueron ideadas en un tiempo en el que con tal de alcanzar altos nivel de fiabilidad se asumían grandes costes. Son por tanto necesarias nuevas técnicas que permitan la protección contra fallos en los segmentos más populares. La complejidad de los diseños está encareciendo la validación "post-silicon". Su coste excede el de diseño, y el número de errores descubiertos está aumentando durante la validación y ya en manos de los clientes. La localización y el diagnóstico de errores son los mayores problemas, empeorados por las altas latencias en la manifestación de errores, por la poca observabilidad interna y por el coste de generar las señales esperadas. Esta tesis explora dos direcciones para tratar algunos de los retos causados por la creciente vulnerabilidad hardware y por las limitaciones de los enfoques de validación. Primero exploramos mecanismos para detectar múltiples fuentes de fallos durante el tiempo de vida de los procesadores (errores transitorios, intermitentes, permanentes y de diseño). Nuestras soluciones son de un paradigma donde la fiabilidad se construye explotando invariantes microarquitectónicos genéricos, en lugar de basarse en re-ejecución o en protección ad-hoc. Para ello descomponemos las funcionalidades básicas de un procesador y proponemos tres soluciones de `runtime verification' que combinadas permiten una detección de errores a nivel global. Estas tres soluciones son: un verificador de flujo de datos de registro y de computación, un verificador de flujo de datos de memoria y un verificador de flujo de control. Nuestras técnicas usan el concepto de firmas y permiten a los diseñadores ajustar los niveles de protección a sus necesidades, mediante compensaciones en área, consumo energético y rendimiento. Nuestros estudios de inyección de errores revelan que los métodos propuestos obtienen altos niveles de protección, a la vez que causan menos costes que las soluciones existentes. A continuación, esta tesis explora la aplicabilidad de estos esquemas a las fases de validación. Proponemos una solución de localización y diagnóstico de errores para el flujo de datos de memoria que combina nuestro mecanismo de detección de errores, junto con un mecanismo de logging de bajo coste y un programa de diagnóstico. Cierta actividad interna es continuamente registrada en una zona de memoria cuya capacidad puede ser expandida para satisfacer las necesidades de validación. La solución permite descubrir bugs, reduciendo la necesidad de calcular los resultados esperados. Al detectar un error, el algoritmo de diagnóstico analiza el registro para automáticamente localizar el bug y determinar su causa. Nuestros estudios muestran un alto grado de localización y de precisión de diagnóstico a un coste muy bajo de rendimiento y área. El resultado es una simplificación de las prácticas actuales de depuración, que son enormemente manuales, incómodas y largas. En conjunto, las soluciones de esta tesis capacitan a la industria a producir procesadores más fiables, a medida que la tecnología evoluciona hacia diseños más complejos y más vulnerables

Development of Robust Analog and Mixed-Signal Circuits in the Presence of Process- Voltage-Temperature Variations

Author: Onabajo Marvin Olufemi
Publication venue
Publication date
Field of study

Continued improvements of transceiver systems-on-a-chip play a key role in the advancement of mobile telecommunication products as well as wireless systems in biomedical and remote sensing applications. This dissertation addresses the problems of escalating CMOS process variability and system complexity that diminish the reliability and testability of integrated systems, especially relating to the analog and mixed-signal blocks. The proposed design techniques and circuit-level attributes are aligned with current built-in testing and self-calibration trends for integrated transceivers. In this work, the main focus is on enhancing the performances of analog and mixed-signal blocks with digitally adjustable elements as well as with automatic analog tuning circuits, which are experimentally applied to conventional blocks in the receiver path in order to demonstrate the concepts. The use of digitally controllable elements to compensate for variations is exemplified with two circuits. First, a distortion cancellation method for baseband operational transconductance amplifiers is proposed that enables a third-order intermodulation (IM3) improvement of up to 22dB. Fabricated in a 0.13µm CMOS process with 1.2V supply, a transconductance-capacitor lowpass filter with the linearized amplifiers has a measured IM3 below -70dB (with 0.2V peak-to-peak input signal) and 54.5dB dynamic range over its 195MHz bandwidth. The second circuit is a 3-bit two-step quantizer with adjustable reference levels, which was designed and fabricated in 0.18µm CMOS technology as part of a continuous-time SigmaDelta analog-to-digital converter system. With 5mV resolution at a 400MHz sampling frequency, the quantizer's static power dissipation is 24mW and its die area is 0.4mm^2. An alternative to electrical power detectors is introduced by outlining a strategy for built-in testing of analog circuits with on-chip temperature sensors. Comparisons of an amplifier's measurement results at 1GHz with the measured DC voltage output of an on-chip temperature sensor show that the amplifier's power dissipation can be monitored and its 1-dB compression point can be estimated with less than 1dB error. The sensor has a tunable sensitivity up to 200mV/mW, a power detection range measured up to 16mW, and it occupies a die area of 0.012mm^2 in standard 0.18µm CMOS technology. Finally, an analog calibration technique is discussed to lessen the mismatch between transistors in the differential high-frequency signal path of analog CMOS circuits. The proposed methodology involves auxiliary transistors that sense the existing mismatch as part of a feedback loop for error minimization. It was assessed by performing statistical Monte Carlo simulations of a differential amplifier and a double-balanced mixer designed in CMOS technologies

Parker solar probe: four years of discoveries at solar cycle minimum

Author: Agapitov OV
Akhavan-Tafti M
Allen RC
Badman ST
Bale SD
Bandyopadhyay R
Battams K
Berčič L
Bourouaine S
Bowen T
Bruno R
Cattell C
Chandran BDG
Chen CHK
Chhiber R
Cohen CMS
D'Amicis R
Decker RB
Giacalone J
Hess P
Horbury TS
Howard RA
Jagarlamudi VK
Joyce CJ
Kasper JC
Kinnison J
Klein KG
Laker R
Liewer P
Linton M
Malaspina DM
Mann I
Matteini L
Matthaeus WH
McComas DJ
Niembro-Hernandez T
Panasenco O
Perez JC
Pokorný P
Pulupa M
Pusack A
Raouafi NE
Riley P
Rouillard AP
Shi C
Squire J
Stenborg G
Szabo A
Szalay JR
Tenerani A
Velli M
Verniero JL
Viall N
Vourlidas A
Wood BE
Woodham LD
Woolley T
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/01/2023
Field of study

Launched on 12 Aug. 2018, NASA’s Parker Solar Probe had completed 13 of its scheduled 24 orbits around the Sun by Nov. 2022. The mission’s primary science goal is to determine the structure and dynamics of the Sun’s coronal magnetic field, understand how the solar corona and wind are heated and accelerated, and determine what processes accelerate energetic particles. Parker Solar Probe returned a treasure trove of science data that far exceeded quality, significance, and quantity expectations, leading to a significant number of discoveries reported in nearly 700 peer-reviewed publications. The first four years of the 7-year primary mission duration have been mostly during solar minimum conditions with few major solar events. Starting with orbit 8 (i.e., 28 Apr. 2021), Parker flew through the magnetically dominated corona, i.e., sub-Alfvénic solar wind, which is one of the mission’s primary objectives. In this paper, we present an overview of the scientific advances made mainly during the first four years of the Parker Solar Probe mission, which go well beyond the three science objectives that are: (1) Trace the flow of energy that heats and accelerates the solar corona and solar wind; (2) Determine the structure and dynamics of the plasma and magnetic fields at the sources of the solar wind; and (3) Explore mechanisms that accelerate and transport energetic particles

Spiral - Imperial College Digital Repository