373 research outputs found
Efficient protection of the pipeline core for safety-critical processor-based systems
The increasing number of safety-critical commercial
applications has generated a need for components with high
levels of reliability. As CMOS process sizes continue to shrink,
the reliability of ICs is negatively affected since they become
more sensitive to transient faults. New circuit designs must take
this fact into consideration, and incorporate adequate protection
against the effects of transient faults. This paper presents a
novel method for protecting the pipelined execution unit of an
embedded processor. It is based on a self-configured architecture
with hybrid redundancy that can mask single and multiple
errors, which can occur on storage elements due to transient
or permanent faults. This concept can be easily applied to any
processing architecture of this nature with a high safety integrity
level. Results from error-injection experiments are also reported
that show that this design can maintain a non-interrupted and
failure-free operation under single and double errors with a
probability that exceeds 99.4%
Fully Automated Radiation Hardened by Design Circuit Construction
abstract: A fully automated logic design methodology for radiation hardened by design (RHBD) high speed logic using fine grained triple modular redundancy (TMR) is presented. The hardening techniques used in the cell library are described and evaluated, with a focus on both layout techniques that mitigate total ionizing dose (TID) and latchup issues and flip-flop designs that mitigate single event transient (SET) and single event upset (SEU) issues. The base TMR self-correcting master-slave flip-flop is described and compared to more traditional hardening techniques. Additional refinements are presented, including testability features that disable the self-correction to allow detection of manufacturing defects. The circuit approach is validated for hardness using both heavy ion and proton broad beam testing. For synthesis and auto place and route, the methodology and circuits leverage commercial logic design automation tools. These tools are glued together with custom CAD tools designed to enable easy conversion of standard single redundant hardware description language (HDL) files into hardened TMR circuitry. The flow allows hardening of any synthesizable logic at clock frequencies comparable to unhardened designs and supports standard low-power techniques, e.g. clock gating and supply voltage scaling.Dissertation/ThesisPh.D. Electrical Engineering 201
Avoiding core's DUE & SDC via acoustic wave detectors and tailored error containment and recovery
The trend of downsizing transistors and operating voltage scaling has made the processor chip more sensitive against radiation phenomena making soft errors an important challenge. New reliability techniques for handling soft errors in the logic and memories that allow meeting the desired failures-in-time (FIT) target are key to keep harnessing the benefits of Moore's law. The failure to scale the soft error rate caused by particle strikes, may soon limit the total number of cores that one may have running at the same time. This paper proposes a light-weight and scalable architecture to eliminate silent data corruption errors (SDC) and detected unrecoverable errors (DUE) of a core. The architecture uses acoustic wave detectors for error detection. We propose to recover by confining the errors in the cache hierarchy, allowing us to deal with the relatively long detection latencies. Our results show that the proposed mechanism protects the whole core (logic, latches and memory arrays) incurring performance overhead as low as 0.60%. © 2014 IEEE.Peer ReviewedPostprint (author's final draft
Advanced information processing system for advanced launch system: Hardware technology survey and projections
The major goals of this effort are as follows: (1) to examine technology insertion options to optimize Advanced Information Processing System (AIPS) performance in the Advanced Launch System (ALS) environment; (2) to examine the AIPS concepts to ensure that valuable new technologies are not excluded from the AIPS/ALS implementations; (3) to examine advanced microprocessors applicable to AIPS/ALS, (4) to examine radiation hardening technologies applicable to AIPS/ALS; (5) to reach conclusions on AIPS hardware building blocks implementation technologies; and (6) reach conclusions on appropriate architectural improvements. The hardware building blocks are the Fault-Tolerant Processor, the Input/Output Sequencers (IOS), and the Intercomputer Interface Sequencers (ICIS)
Statistical Reliability Estimation of Microprocessor-Based Systems
What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target softwar
Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures
abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies.
Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques.
A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
NASA Space Engineering Research Center for VLSI System Design
This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems
A Structured Design Methodology for High Performance VLSI Arrays
abstract: The geometric growth in the integrated circuit technology due to transistor scaling also with system-on-chip design strategy, the complexity of the integrated circuit has increased manifold. Short time to market with high reliability and performance is one of the most competitive challenges. Both custom and ASIC design methodologies have evolved over the time to cope with this but the high manual labor in custom and statistic design in ASIC are still causes of concern. This work proposes a new circuit design strategy that focuses mostly on arrayed structures like TLB, RF, Cache, IPCAM etc. that reduces the manual effort to a great extent and also makes the design regular, repetitive still achieving high performance. The method proposes making the complete design custom schematic but using the standard cells. This requires adding some custom cells to the already exhaustive library to optimize the design for performance. Once schematic is finalized, the designer places these standard cells in a spreadsheet, placing closely the cells in the critical paths. A Perl script then generates Cadence Encounter compatible placement file. The design is then routed in Encounter. Since designer is the best judge of the circuit architecture, placement by the designer will allow achieve most optimal design. Several designs like IPCAM, issue logic, TLB, RF and Cache designs were carried out and the performance were compared against the fully custom and ASIC flow. The TLB, RF and Cache were the part of the HEMES microprocessor.Dissertation/ThesisPh.D. Electrical Engineering 201
Error Detection and Diagnosis for System-on-Chip in Space Applications
Tesis por compendio de publicacionesLos componentes electrónicos comerciales, comúnmente llamados componentes
Commercial-Off-The-Shelf (COTS) están presentes en multitud de dispositivos habituales
en nuestro día a día. Particularmente, el uso de microprocesadores y sistemas en chip (SoC)
altamente integrados ha favorecido la aparición de dispositivos electrónicos cada vez más
inteligentes que sostienen el estilo de vida y el avance de la sociedad moderna. Su uso se
ha generalizado incluso en aquellos sistemas que se consideran críticos para la seguridad,
como vehículos, aviones, armamento, dispositivos médicos, implantes o centrales eléctricas.
En cualquiera de ellos, un fallo podría tener graves consecuencias humanas o económicas.
Sin embargo, todos los sistemas electrónicos conviven constantemente con factores internos
y externos que pueden provocar fallos en su funcionamiento. La capacidad de un sistema
para funcionar correctamente en presencia de fallos se denomina tolerancia a fallos, y es
un requisito en el diseño y operación de sistemas críticos.
Los vehículos espaciales como satélites o naves espaciales también hacen uso de
microprocesadores para operar de forma autónoma o semi autónoma durante su vida útil,
con la dificultad añadida de que no pueden ser reparados en órbita, por lo que se consideran
sistemas críticos. Además, las duras condiciones existentes en el espacio, y en particular
los efectos de la radiación, suponen un gran desafío para el correcto funcionamiento de los
dispositivos electrónicos. Concretamente, los fallos transitorios provocados por radiación
(conocidos como soft errors) tienen el potencial de ser una de las mayores amenazas para
la fiabilidad de un sistema en el espacio.
Las misiones espaciales de gran envergadura, típicamente financiadas públicamente
como en el caso de la NASA o la Agencia Espacial Europea (ESA), han tenido
históricamente como requisito evitar el riesgo a toda costa por encima de cualquier
restricción de coste o plazo. Por ello, la selección de componentes resistentes a la radiación
(rad-hard) específicamente diseñados para su uso en el espacio ha sido la metodología
imperante en el paradigma que hoy podemos denominar industria espacial tradicional, u
Old Space. Sin embargo, los componentes rad-hard tienen habitualmente un coste mucho
más alto y unas prestaciones mucho menores que otros componentes COTS equivalentes.
De hecho, los componentes COTS ya han sido utilizados satisfactoriamente en misiones
de la NASA o la ESA cuando las prestaciones requeridas por la misión no podían ser
cubiertas por ningún componente rad-hard existente.
En los últimos años, el acceso al espacio se está facilitando debido en gran parte a la
entrada de empresas privadas en la industria espacial. Estas empresas no siempre buscan
evitar el riesgo a toda costa, sino que deben perseguir una rentabilidad económica, por
lo que hacen un balance entre riesgo, coste y plazo mediante gestión del riesgo en un
paradigma denominado Nuevo Espacio o New Space. Estas empresas a menudo están
interesadas en entregar servicios basados en el espacio con las máximas prestaciones y el mayor beneficio posibles, para lo cual los componentes rad-hard son menos atractivos
debido a su mayor coste y menores prestaciones que los componentes COTS existentes.
Sin embargo, los componentes COTS no han sido específicamente diseñados para su uso
en el espacio y típicamente no incluyen técnicas específicas para evitar que los efectos de
la radiación afecten su funcionamiento. Los componentes COTS se comercializan tal cual
son, y habitualmente no es posible modificarlos para mejorar su resistencia a la radiación.
Además, los elevados niveles de integración de los sistemas en chip (SoC) complejos
de altas prestaciones dificultan su observación y la aplicación de técnicas de tolerancia
a fallos. Este problema es especialmente relevante en el caso de los microprocesadores.
Por tanto, existe un gran interés en el desarrollo de técnicas que permitan conocer y
mejorar el comportamiento de los microprocesadores COTS bajo radiación sin modificar
su arquitectura y sin interferir en su funcionamiento para facilitar su uso en el espacio y
con ello maximizar las prestaciones de las misiones espaciales presentes y futuras.
En esta Tesis se han desarrollado técnicas novedosas para detectar, diagnosticar y
mitigar los errores producidos por radiación en microprocesadores y sistemas en chip
(SoC) comerciales, utilizando la interfaz de traza como punto de observación. La interfaz de
traza es un recurso habitual en los microprocesadores modernos, principalmente enfocado
a soportar las tareas de desarrollo y depuración del software durante la fase de diseño. Sin
embargo, una vez el desarrollo ha concluido, la interfaz de traza típicamente no se utiliza
durante la fase operativa del sistema, por lo que puede ser reutilizada sin coste. La interfaz
de traza constituye un punto de conexión viable para observar el comportamiento de un
microprocesador de forma no intrusiva y sin interferir en su funcionamiento.
Como resultado de esta Tesis se ha desarrollado un módulo IP capaz de recabar
y decodificar la información de traza de un microprocesador COTS moderno de altas
prestaciones. El IP es altamente configurable y personalizable para adaptarse a diferentes
aplicaciones y tipos de procesadores. Ha sido diseñado y validado utilizando el dispositivo
Zynq-7000 de Xilinx como plataforma de desarrollo, que constituye un dispositivo COTS
de interés en la industria espacial. Este dispositivo incluye un procesador ARM Cortex-A9
de doble núcleo, que es representativo del conjunto de microprocesadores hard-core
modernos de altas prestaciones. El IP resultante es compatible con la tecnología ARM
CoreSight, que proporciona acceso a información de traza en los microprocesadores ARM.
El IP incorpora técnicas para detectar errores en el flujo de ejecución y en los datos de la
aplicación ejecutada utilizando la información de traza, en tiempo real y con muy baja
latencia. El IP se ha validado en campañas de inyección de fallos y también en radiación con
protones y neutrones en instalaciones especializadas. También se ha combinado con otras
técnicas de tolerancia a fallos para construir técnicas híbridas de mitigación de errores.
Los resultados experimentales obtenidos demuestran su alta capacidad de detección y
potencialidad en el diagnóstico de errores producidos por radiación.
El resultado de esta Tesis, desarrollada en el marco de un Doctorado Industrial entre
la Universidad Carlos III de Madrid (UC3M) y la empresa Arquimea, se ha transferido satisfactoriamente al entorno empresarial en forma de un proyecto financiado por la
Agencia Espacial Europea para continuar su desarrollo y posterior explotación.Commercial electronic components, also known as Commercial-Off-The-Shelf (COTS),
are present in a wide variety of devices commonly used in our daily life. Particularly, the
use of microprocessors and highly integrated System-on-Chip (SoC) devices has fostered
the advent of increasingly intelligent electronic devices which sustain the lifestyles and the
progress of modern society. Microprocessors are present even in safety-critical systems,
such as vehicles, planes, weapons, medical devices, implants, or power plants. In any of
these cases, a fault could involve severe human or economic consequences. However, every
electronic system deals continuously with internal and external factors that could provoke
faults in its operation. The capacity of a system to operate correctly in presence of faults
is known as fault-tolerance, and it becomes a requirement in the design and operation of
critical systems.
Space vehicles such as satellites or spacecraft also incorporate microprocessors to
operate autonomously or semi-autonomously during their service life, with the additional
difficulty that they cannot be repaired once in-orbit, so they are considered critical systems.
In addition, the harsh conditions in space, and specifically radiation effects, involve a big
challenge for the correct operation of electronic devices. In particular, radiation-induced
soft errors have the potential to become one of the major risks for the reliability of systems
in space.
Large space missions, typically publicly funded as in the case of NASA or European
Space Agency (ESA), have followed historically the requirement to avoid the risk at any
expense, regardless of any cost or schedule restriction. Because of that, the selection of
radiation-resistant components (known as rad-hard) specifically designed to be used in
space has been the dominant methodology in the paradigm of traditional space industry,
also known as “Old Space”. However, rad-hard components have commonly a much higher
associated cost and much lower performance that other equivalent COTS devices. In fact,
COTS components have already been used successfully by NASA and ESA in missions
that requested such high performance that could not be satisfied by any available rad-hard
component.
In the recent years, the access to space is being facilitated in part due to the irruption
of private companies in the space industry. Such companies do not always seek to avoid
the risk at any cost, but they must pursue profitability, so they perform a trade-off between
risk, cost, and schedule through risk management in a paradigm known as “New Space”.
Private companies are often interested in deliver space-based services with the maximum
performance and maximum benefit as possible. With such objective, rad-hard components
are less attractive than COTS due to their higher cost and lower performance.
However, COTS components have not been specifically designed to be used in space
and typically they do not include specific techniques to avoid or mitigate the radiation effects in their operation. COTS components are commercialized “as is”, so it is not
possible to modify them to improve their susceptibility to radiation effects. Moreover,
the high levels of integration of complex, high-performance SoC devices hinder their
observability and the application of fault-tolerance techniques. This problem is especially
relevant in the case of microprocessors. Thus, there is a growing interest in the development
of techniques allowing to understand and improve the behavior of COTS microprocessors
under radiation without modifying their architecture and without interfering with their
operation. Such techniques may facilitate the use of COTS components in space and
maximize the performance of present and future space missions.
In this Thesis, novel techniques have been developed to detect, diagnose, and
mitigate radiation-induced errors in COTS microprocessors and SoCs using the trace
interface as an observation point. The trace interface is a resource commonly found
in modern microprocessors, mainly intended to support software development and
debugging activities during the design phase. However, it is commonly left unused
during the operational phase of the system, so it can be reused with no cost. The trace
interface constitutes a feasible connection point to observe microprocessor behavior in a
non-intrusive manner and without disturbing processor operation.
As a result of this Thesis, an IP module has been developed capable to gather and
decode the trace information of a modern, high-end, COTS microprocessor. The IP is highly
configurable and customizable to support different applications and processor types. The
IP has been designed and validated using the Xilinx Zynq-7000 device as a development
platform, which is an interesting COTS device for the space industry. This device features a
dual-core ARM Cortex-A9 processor, which is a good representative of modern, high-end,
hard-core microprocessors. The resulting IP is compatible with the ARM CoreSight
technology, which enables access to trace information in ARM microprocessors. The IP is
able to detect errors in the execution flow of the microprocessor and in the application data
using trace information, in real time and with very low latency. The IP has been validated
in fault injection campaigns and also under proton and neutron irradiation campaigns in
specialized facilities. It has also been combined with other fault-tolerance techniques
to build hybrid error mitigation approaches. Experimental results demonstrate its high
detection capabilities and high potential for the diagnosis of radiation-induced errors.
The result of this Thesis, developed in the framework of an Industrial Ph.D. between the
University Carlos III of Madrid (UC3M) and the company Arquimea, has been successfully
transferred to the company business as a project sponsored by European Space Agency to
continue its development and subsequent commercialization.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidenta: María Luisa López Vallejo.- Secretario: Enrique San Millán Heredia.- Vocal: Luigi Di Lill
Concepts for on-board satellite image registration. Volume 3: Impact of VLSI/VHSIC on satellite on-board signal processing
Anticipated major advances in integrated circuit technology in the near future are described as well as their impact on satellite onboard signal processing systems. Dramatic improvements in chip density, speed, power consumption, and system reliability are expected from very large scale integration. Improvements are expected from very large scale integration enable more intelligence to be placed on remote sensing platforms in space, meeting the goals of NASA's information adaptive system concept, a major component of the NASA End-to-End Data System program. A forecast of VLSI technological advances is presented, including a description of the Defense Department's very high speed integrated circuit program, a seven-year research and development effort
- …