37,341 research outputs found
Fault-Tolerant Adaptive Parallel and Distributed Simulation
Discrete Event Simulation is a widely used technique that is used to model
and analyze complex systems in many fields of science and engineering. The
increasingly large size of simulation models poses a serious computational
challenge, since the time needed to run a simulation can be prohibitively
large. For this reason, Parallel and Distributes Simulation techniques have
been proposed to take advantage of multiple execution units which are found in
multicore processors, cluster of workstations or HPC systems. The current
generation of HPC systems includes hundreds of thousands of computing nodes and
a vast amount of ancillary components. Despite improvements in manufacturing
processes, failures of some components are frequent, and the situation will get
worse as larger systems are built. In this paper we describe FT-GAIA, a
software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation
middleware. FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some
protection against byzantine failures since synchronization messages are
replicated as well, so that the receiving entity can identify and discard
corrupted messages. We provide an experimental evaluation of FT-GAIA on a
running prototype. Results show that a high degree of fault tolerance can be
achieved, at the cost of a moderate increase in the computational load of the
execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed
Simulation and Real Time Applications (DS-RT 2016
Case study: Bio-inspired self-adaptive strategy for spike-based PID controller
A key requirement for modern large scale
neuromorphic systems is the ability to detect and diagnose faults
and to explore self-correction strategies. In particular, to perform
this under area-constraints which meet scalability requirements
of large neuromorphic systems. A bio-inspired online fault
detection and self-correction mechanism for neuro-inspired PID
controllers is presented in this paper. This strategy employs a
fault detection unit for online testing of the PID controller; uses a
fault detection manager to perform the detection procedure
across multiple controllers, and a controller selection mechanism
to select an available fault-free controller to provide a corrective
step in restoring system functionality. The novelty of the
proposed work is that the fault detection method, using synapse
models with excitatory and inhibitory responses, is applied to a
robotic spike-based PID controller. The results are presented for
robotic motor controllers and show that the proposed bioinspired
self-detection and self-correction strategy can detect
faults and re-allocate resources to restore the controller’s
functionality. In particular, the case study demonstrates the
compactness (~1.4% area overhead) of the fault detection
mechanism for large scale robotic controllers.Ministerio de Economía y Competitividad TEC2012-37868-C04-0
Multilevel Clustering Fault Model for IC Manufacture
A hierarchical approach to the construction of compound distributions for
process-induced faults in IC manufacture is proposed. Within this framework,
the negative binomial distribution is treated as level-1 models. The
hierarchical approach to fault distribution offers an integrated picture of how
fault density varies from region to region within a wafer, from wafer to wafer
within a batch, and so on. A theory of compound-distribution hierarchies is
developed by means of generating functions. A study of correlations, which
naturally appears in microelectronics due to the batch character of IC
manufacture, is proposed. Taking these correlations into account is of
significant importance for developing procedures for statistical quality
control in IC manufacture. With respect to applications, hierarchies of yield
means and yield probability-density functions are considered.Comment: 10 pages, the International Conference "Micro- and Nanoelectronics-
2003" (ICMNE-2003),Zvenigorod, Moscow district, Russia, October 6-10, 200
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Improving Aircraft Engines Prognostics and Health Management via Anticipated Model-Based Validation of Health Indicators
The aircraft engines manufacturing industry is subjected to many dependability constraints from certification authorities and economic background. In particular, the costs induced by unscheduled maintenance and delays and cancellations impose to ensure a minimum level of availability. For this purpose, Prognostics and Health Management (PHM) is used as a means to perform online periodic assessment of the engines’ health status. The whole PHM methodology is based on the processing of some variables reflecting the system’s health status named Health Indicators. The collecting of HI is an on-board embedded task which has to be specified before the entry into service for matters of retrofit costs. However, the current development methodology of PHM systems is considered as a marginal task in the industry and it is observed that most of the time, the set of HI is defined too late and only in a qualitative way. In this paper, the authors propose a novel development methodology for PHM systems centered on an anticipated model-based validation of HI. This validation is based on the use of uncertainties propagation to simulate the distributions of HI including the randomness of parameters. The paper defines also some performance metrics and criteria for the validation of the HI set. Eventually, the methodology is applied to the development of a PHM solution for an aircraft engine actuation loop. It reveals a lack of performance of the original set of HI and allows defining new ones in order to meet the specifications before the entry into service
NASA space station automation: AI-based technology review
Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures
- …