Search CORE

7,034 research outputs found

DeSyRe: on-Demand System Reliability

Author: Armato Antonino
Bouganis Christos-Savvas
Falsafi Babak
Gaydadjiev Georgi
Isaza Sebastian
Malek Alirad
Mariani Riccardo
Pnevmatikatos Dionisios N
Pradhan Dhiraj K
Rauwerda Gerard
Seepers Robert
Shafik Rishad Ahmed
Sourdis Ioannis
Strydis Christos
Sunesen Kim
Theodoropoulos Dimitris
Tzilis Stavros
Vavouras Michail
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

Southampton (e-Prints Soton)

EUR Research Repository

Chalmers Research

Chalmers Publication Library

Explore Bristol Research

Autonomous Fault Detection in Self-Healing Systems using Restricted Boltzmann Machines

Author: Barker Adam
Dobson Simon
Schneider Chris
Publication venue
Publication date: 07/01/2015
Field of study

Autonomously detecting and recovering from faults is one approach for reducing the operational complexity and costs associated with managing computing environments. We present a novel methodology for autonomously generating investigation leads that help identify systems faults, and extends our previous work in this area by leveraging Restricted Boltzmann Machines (RBMs) and contrastive divergence learning to analyse changes in historical feature data. This allows us to heuristically identify the root cause of a fault, and demonstrate an improvement to the state of the art by showing feature data can be predicted heuristically beyond a single instance to include entire sequences of information.Comment: Published and presented in the 11th IEEE International Conference and Workshops on Engineering of Autonomic and Autonomous Systems (EASe 2014

arXiv.org e-Print Archive

CiteSeerX

St Andrews Research Repository

Policy Enforcement with Proactive Libraries

Author: Mariani Leonardo
Micucci Daniela
Riganelli Oliviero
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/05/2017
Field of study

Software libraries implement APIs that deliver reusable functionalities. To correctly use these functionalities, software applications must satisfy certain correctness policies, for instance policies about the order some API methods can be invoked and about the values that can be used for the parameters. If these policies are violated, applications may produce misbehaviors and failures at runtime. Although this problem is general, applications that incorrectly use API methods are more frequent in certain contexts. For instance, Android provides a rich and rapidly evolving set of APIs that might be used incorrectly by app developers who often implement and publish faulty apps in the marketplaces. To mitigate this problem, we introduce the novel notion of proactive library, which augments classic libraries with the capability of proactively detecting and healing misuses at run- time. Proactive libraries blend libraries with multiple proactive modules that collect data, check the correctness policies of the libraries, and heal executions as soon as the violation of a correctness policy is detected. The proactive modules can be activated or deactivated at runtime by the users and can be implemented without requiring any change to the original library and any knowledge about the applications that may use the library. We evaluated proactive libraries in the context of the Android ecosystem. Results show that proactive libraries can automati- cally overcome several problems related to bad resource usage at the cost of a small overhead.Comment: O. Riganelli, D. Micucci and L. Mariani, "Policy Enforcement with Proactive Libraries" 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Buenos Aires, Argentina, 2017, pp. 182-19

arXiv.org e-Print Archive

Crossref

Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

Author: Balasubramanian Vivek
Cervone Guido
Hu Weiming
Jha Shantenu
Lefebvre Matthieu
Lei Wenjie
Tromp Jeroen
Turilli Matteo
Publication venue
Publication date: 16/05/2018
Field of study

Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

arXiv.org e-Print Archive

Crossref