Search CORE

17,327 research outputs found

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

Crossref

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

Swepub

Fault-Tolerant Adaptive Parallel and Distributed Simulation

Author: Armaroli Lorenzo
D'Angelo Gabriele
Ferretti Stefano
Marzolla Moreno
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

NASA space station automation: AI-based technology review

Author: Firschein O.
Georgeff M. P.
Kautz W. H.
Levitt K. N.
Neumann P.
Park W.
Poggio A. A.
Rom R. J.
Publication venue
Publication date
Field of study

Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures

NASA Technical Reports Server

Case study: Bio-inspired self-adaptive strategy for spike-based PID controller

Author: Harkin Jim
Jiménez Fernández Ángel Francisco
Linares Barranco Alejandro
Liu Junxiu
McDaid Liam
McElholm Malachy
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

A key requirement for modern large scale neuromorphic systems is the ability to detect and diagnose faults and to explore self-correction strategies. In particular, to perform this under area-constraints which meet scalability requirements of large neuromorphic systems. A bio-inspired online fault detection and self-correction mechanism for neuro-inspired PID controllers is presented in this paper. This strategy employs a fault detection unit for online testing of the PID controller; uses a fault detection manager to perform the detection procedure across multiple controllers, and a controller selection mechanism to select an available fault-free controller to provide a corrective step in restoring system functionality. The novelty of the proposed work is that the fault detection method, using synapse models with excitatory and inhibitory responses, is applied to a robotic spike-based PID controller. The results are presented for robotic motor controllers and show that the proposed bioinspired self-detection and self-correction strategy can detect faults and re-allocate resources to restore the controller’s functionality. In particular, the case study demonstrates the compactness (~1.4% area overhead) of the fault detection mechanism for large scale robotic controllers.Ministerio de Economía y Competitividad TEC2012-37868-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Quality assessment technique for ubiquitous software and middleware

Author: Hong G.Y.
James H.
Ryu H.
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

The new paradigm of computing or information systems is ubiquitous computing systems. The technology-oriented issues of ubiquitous computing systems have made researchers pay much attention to the feasibility study of the technologies rather than building quality assurance indices or guidelines. In this context, measuring quality is the key to developing high-quality ubiquitous computing products. For this reason, various quality models have been defined, adopted and enhanced over the years, for example, the need for one recognised standard quality model (ISO/IEC 9126) is the result of a consensus for a software quality model on three levels: characteristics, sub-characteristics, and metrics. However, it is very much unlikely that this scheme will be directly applicable to ubiquitous computing environments which are considerably different to conventional software, trailing a big concern which is being given to reformulate existing methods, and especially to elaborate new assessment techniques for ubiquitous computing environments. This paper selects appropriate quality characteristics for the ubiquitous computing environment, which can be used as the quality target for both ubiquitous computing product evaluation processes ad development processes. Further, each of the quality characteristics has been expanded with evaluation questions and metrics, in some cases with measures. In addition, this quality model has been applied to the industrial setting of the ubiquitous computing environment. These have revealed that while the approach was sound, there are some parts to be more developed in the future

Massey Research Online