Search CORE

947 research outputs found

X-Rel: Energy-Efficient and Low-Overhead Approximate Reliability Framework for Error-Tolerant Applications Deployed in Critical Systems

Author: Akbari Omid
Hochberger Christian
Shafique Muhammad
Vafaei Jafar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/07/2023
Field of study

Triple Modular Redundancy (TMR) is one of the most common techniques in fault-tolerant systems, in which the output is determined by a majority voter. However, the design diversity of replicated modules and/or soft errors that are more likely to happen in the nanoscale era may affect the majority voting scheme. Besides, the significant overheads of the TMR scheme may limit its usage in energy consumption and area-constrained critical systems. However, for most inherently error-resilient applications such as image processing and vision deployed in critical systems (like autonomous vehicles and robotics), achieving a given level of reliability has more priority than precise results. Therefore, these applications can benefit from the approximate computing paradigm to achieve higher energy efficiency and a lower area. This paper proposes an energy-efficient approximate reliability (X-Rel) framework to overcome the aforementioned challenges of the TMR systems and get the full potential of approximate computing without sacrificing the desired reliability constraint and output quality. The X-Rel framework relies on relaxing the precision of the voter based on a systematical error bounding method that leverages user-defined quality and reliability constraints. Afterward, the size of the achieved voter is used to approximate the TMR modules such that the overall area and energy consumption are minimized. The effectiveness of employing the proposed X-Rel technique in a TMR structure, for different quality constraints as well as with various reliability bounds are evaluated in a 15-nm FinFET technology. The results of the X-Rel voter show delay, area, and energy consumption reductions of up to 86%, 87%, and 98%, respectively, when compared to those of the state-of-the-art approximate TMR voters.Comment: This paper has been published in IEEE Transactions on Very Large Scale Integration (VLSI) System

arXiv.org e-Print Archive

Review of Fault Mitigation Approaches for Deep Neural Networks for Computer Vision in Autonomous Driving

Author: Cerino Mattia
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 21/07/2020
Field of study

The aim of this work is to identify and present challenges and risks related to the employment of DNNs in Computer Vision for Autonomous Driving. Nowadays one of the major technological challenges is to choose the right technology among the abundance that is available on the market. Specifically, in this thesis it is collected a synopsis of the state-of-the-art architectures, techniques and methodologies adopted for building fault-tolerant hardware and ensuring robustness in DNNs-based Computer Vision applications for Autonomous Driving

AMS Tesi di Laurea

Deviation-tolerant computation in concurrent failure-prone hardware

Author: Marculescu D.
Stanley-Marbell P.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2008
Field of study

Repository TU/e

Pure OAI Repository

Development and certification of mixed-criticality embedded systems based on probabilistic timing analysis

Author: Agirre Troncoso Irune
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

An increasing variety of emerging systems relentlessly replaces or augments the functionality of mechanical subsystems with embedded electronics. For quantity, complexity, and use, the safety of such subsystems is an increasingly important matter. Accordingly, those systems are subject to safety certification to demonstrate system's safety by rigorous development processes and hardware/software constraints. The massive augment in embedded processors' complexity renders the arduous certification task significantly harder to achieve. The focus of this thesis is to address the certification challenges in multicore architectures: despite their potential to integrate several applications on a single platform, their inherent complexity imperils their timing predictability and certification. Recently, the Measurement-Based Probabilistic Timing Analysis (MBPTA) technique emerged as an alternative to deal with hardware/software complexity. The innovation that MBPTA brings about is, however, a major step from current certification procedures and standards. The particular contributions of this Thesis include: (i) the definition of certification arguments for mixed-criticality integration upon multicore processors. In particular we propose a set of safety mechanisms and procedures as required to comply with functional safety standards. For timing predictability, (ii) we present a quantitative approach to assess the likelihood of execution-time exceedance events with respect to the risk reduction requirements on safety standards. To this end, we build upon the MBPTA approach and we present the design of a safety-related source of randomization (SoR), that plays a key role in the platform-level randomization needed by MBPTA. And (iii) we evaluate current certification guidance with respect to emerging high performance design trends like caches. Overall, this Thesis pushes the certification limits in the use of multicore and MBPTA technology in Critical Real-Time Embedded Systems (CRTES) and paves the way towards their adoption in industry.Una creciente variedad de sistemas emergentes reemplazan o aumentan la funcionalidad de subsistemas mecánicos con componentes electrónicos embebidos. El aumento en la cantidad y complejidad de dichos subsistemas electrónicos así como su cometido, hacen de su seguridad una cuestión de creciente importancia. Tanto es así que la comercialización de estos sistemas críticos está sujeta a rigurosos procesos de certificación donde se garantiza la seguridad del sistema mediante estrictas restricciones en el proceso de desarrollo y diseño de su hardware y software. Esta tesis trata de abordar los nuevos retos y dificultades dadas por la introducción de procesadores multi-núcleo en dichos sistemas críticos: aunque su mayor rendimiento despierta el interés de la industria para integrar múltiples aplicaciones en una sola plataforma, suponen una mayor complejidad. Su arquitectura desafía su análisis temporal mediante los métodos tradicionales y, asimismo, su certificación es cada vez más compleja y costosa. Con el fin de lidiar con estas limitaciones, recientemente se ha desarrollado una novedosa técnica de análisis temporal probabilístico basado en medidas (MBPTA). La innovación de esta técnica, sin embargo, supone un gran cambio cultural respecto a los estándares y procedimientos tradicionales de certificación. En esta línea, las contribuciones de esta tesis están agrupadas en tres ejes principales: (i) definición de argumentos de seguridad para la certificación de aplicaciones de criticidad-mixta sobre plataformas multi-núcleo. Se definen, en particular, mecanismos de seguridad, técnicas de diagnóstico y reacción de faltas acorde con el estándar IEC 61508 sobre una arquitectura multi-núcleo de referencia. Respecto al análisis temporal, (ii) presentamos la cuantificación de la probabilidad de exceder un límite temporal y su relación con los requisitos de reducción de riesgos derivados de los estándares de seguridad funcional. Con este fin, nos basamos en la técnica MBPTA y presentamos el diseño de una fuente de números aleatorios segura; un componente clave para conseguir las propiedades aleatorias requeridas por MBPTA a nivel de plataforma. Por último, (iii) extrapolamos las guías actuales para la certificación de arquitecturas multi-núcleo a una solución comercial de 8 núcleos y las evaluamos con respecto a las tendencias emergentes de diseño de alto rendimiento (caches). Con estas contribuciones, esta tesis trata de abordar los retos que el uso de procesadores multi-núcleo y MBPTA implican en el proceso de certificación de sistemas críticos de tiempo real y facilita, de esta forma, su adopción por la industria.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Distributed photovoltaic systems: Utility interface issues and their present status

Author: Hassan M.
Klein J.
Publication venue
Publication date
Field of study

Major technical issues involving the integration of distributed photovoltaics (PV) into electric utility systems are defined and their impacts are described quantitatively. An extensive literature search, interviews, and analysis yielded information about the work in progress and highlighted problem areas in which additional work and research are needed. The findings from the literature search were used to determine whether satisfactory solutions to the problems exist or whether satisfactory approaches to a solution are underway. It was discovered that very few standards, specifications, or guidelines currently exist that will aid industry in integrating PV into the utility system. Specific areas of concern identified are: (1) protection, (2) stability, (3) system unbalance, (4) voltage regulation and reactive power requirements, (5) harmonics, (6) utility operations, (7) safety, (8) metering, and (9) distribution system planning and design

NASA Technical Reports Server

Tiny Groups Tackle Byzantine Adversaries

Author: Jaiyeola Mercy O.
Patron Kyle
Saia Jared
Young Maxwell
Zhou Qian M.
Publication venue
Publication date: 08/01/2018
Field of study

A popular technique for tolerating malicious faults in open distributed systems is to establish small groups of participants, each of which has a non-faulty majority. These groups are used as building blocks to design attack-resistant algorithms. Despite over a decade of active research, current constructions require group sizes of

O(\log n)

, where

n

is the number of participants in the system. This group size is important since communication and state costs scale polynomially with this parameter. Given the stubbornness of this logarithmic barrier, a natural question is whether better bounds are possible. Here, we consider an attacker that controls a constant fraction of the total computational resources in the system. By leveraging proof-of-work (PoW), we demonstrate how to reduce the group size exponentially to

O(\log\log n)

while maintaining strong security guarantees. This reduction in group size yields a significant improvement in communication and state costs.Comment: This work is supported by the National Science Foundation grant CCF 1613772 and a C Spire Research Gif

arXiv.org e-Print Archive

Crossref