32,058 research outputs found
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Developement of real time diagnostics and feedback algorithms for JET in view of the next step
Real time control of many plasma parameters will be an essential aspect in
the development of reliable high performance operation of Next Step Tokamaks.
The main prerequisites for any feedback scheme are the precise real-time
determination of the quantities to be controlled, requiring top quality and
highly reliable diagnostics, and the availability of robust control algorithms.
A new set of real time diagnostics was recently implemented on JET to prove the
feasibility of determining, with high accuracy and time resolution, the most
important plasma quantities. With regard to feedback algorithms, new
model–based controllers were developed to allow a more robust control of
several plasma parameters. Both diagnostics and algorithms were successfully
used in several experiments, ranging from H-mode plasmas to configuration with
ITBs. Since elaboration of computationally heavy measurements is often
required, significant attention was devoted to non-algorithmic methods like
Digital or Cellular Neural/Nonlinear Networks. The real time hardware and
software adopted architectures are also described with particular attention to
their relevance to ITER.Comment: 12th International Congress on Plasma Physics, 25-29 October 2004,
Nice (France
Data-driven Soft Sensors in the Process Industry
In the last two decades Soft Sensors established themselves as a valuable alternative to the traditional means for the acquisition of critical process variables, process monitoring and other tasks which are related to process control. This paper discusses characteristics of the process industry data which are critical for the development of data-driven Soft Sensors. These characteristics are common to a large number of process industry fields, like the chemical industry, bioprocess industry, steel industry, etc. The focus of this work is put on the data-driven Soft Sensors because of their growing popularity, already demonstrated usefulness and huge, though yet not completely realised, potential. A comprehensive selection of case studies covering the three most important Soft Sensor application fields, a general introduction to the most popular Soft Sensor modelling techniques as well as a discussion of some open issues in the Soft Sensor development and maintenance and their possible solutions are the main contributions of this work
Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators
In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude.
We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research
project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement
n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft
Towards Accurate Estimation of Error Sensitivity in Computer Systems
Fault injection is an increasingly important method for assessing, measuringand observing the system-level impact of hardware and software faults in computer systems. This thesis presents the results of a series of experimental studies in which fault injection was used to investigate the impact of bit-flip errors on program execution. The studies were motivated by the fact that transient hardware faults in microprocessors can cause bit-flip errors that can propagate to the microprocessors instruction set architecture registers and main memory. As the rate of such hardware faults is expected to increase with technology scaling, there is a need to better understand how these errors (known as ‘soft errors’) influence program execution, especially in safety-critical systems.Using ISA-level fault injection, we investigate how five aspects, or factors, influence the error sensitivity of a program. We define error sensitivity as the conditional probability that a bit-flip error in live data in an ISA-register or main-memory word will cause a program to produce silent data corruption (SDC; i.e., an erroneous result). We also consider the estimation of a measure called SDC count, which represents the number of ISA-level bit flips that cause an SDC.The five factors addressed are (a) the inputs processed by a program, (b) the level of compiler optimization, (c) the implementation of the program in the source code, (d) the fault model (single bit flips vs double bit flips) and (e)the fault-injection technique (inject-on-write vs inject-on-read). Our results show that these factors affect the error sensitivity in many ways; some factors strongly impact the error sensitivity or SDC count whereas others show a weaker impact. For example, our experiments show that single bit flips tend to cause SDCs more than double bit flips; compiler optimization positively impacts the SDC count but not necessarily the error sensitivity; the error sensitivity varies between 20% and 50% among the programs we tested; and variations in input affect the error sensitivity significantly for most of the tested programs
Medical image computing and computer-aided medical interventions applied to soft tissues. Work in progress in urology
Until recently, Computer-Aided Medical Interventions (CAMI) and Medical
Robotics have focused on rigid and non deformable anatomical structures.
Nowadays, special attention is paid to soft tissues, raising complex issues due
to their mobility and deformation. Mini-invasive digestive surgery was probably
one of the first fields where soft tissues were handled through the development
of simulators, tracking of anatomical structures and specific assistance
robots. However, other clinical domains, for instance urology, are concerned.
Indeed, laparoscopic surgery, new tumour destruction techniques (e.g. HIFU,
radiofrequency, or cryoablation), increasingly early detection of cancer, and
use of interventional and diagnostic imaging modalities, recently opened new
challenges to the urologist and scientists involved in CAMI. This resulted in
the last five years in a very significant increase of research and developments
of computer-aided urology systems. In this paper, we propose a description of
the main problems related to computer-aided diagnostic and therapy of soft
tissues and give a survey of the different types of assistance offered to the
urologist: robotization, image fusion, surgical navigation. Both research
projects and operational industrial systems are discussed
Using machine learning techniques to evaluate multicore soft error reliability
Virtual platform frameworks have been extended
to allow earlier soft error analysis of more realistic multicore
systems (i.e., real software stacks, state-of-the-art ISAs). The
high observability and simulation performance of underlying
frameworks enable to generate and collect more error/failurerelated data, considering complex software stack configurations,
in a reasonable time. When dealing with sizeable failure-related
data sets obtained from multiple fault campaigns, it is essential to
filter out parameters (i.e., features) without a direct relationship
with the system soft error analysis. In this regard, this paper proposes the use of supervised and unsupervised machine learning
techniques, aiming to eliminate non-relevant information as well
as identify the correlation between fault injection results and
application and platform characteristics. This novel approach
provides engineers with appropriate means that able are able to
investigate new and more efficient fault mitigation techniques.
The underlying approach is validated with an extensive data set
gathered from more than 1.2 million fault injections, comprising
several benchmarks, a Linux OS and parallelization libraries
(e.g., MPI, OpenMP), as well as through a realistic automotive
case study
- …