Search CORE

45 research outputs found

Restart-Based Fault-Tolerance: System Design and Schedulability Analysis

Author: Abdi Fardin
Caccamo Marco
Mancuso Renato
Tabish Rohan
Publication venue
Publication date: 01/01/2017
Field of study

Embedded systems in safety-critical environments are continuously required to deliver more performance and functionality, while expected to provide verified safety guarantees. Nonetheless, platform-wide software verification (required for safety) is often expensive. Therefore, design methods that enable utilization of components such as real-time operating systems (RTOS), without requiring their correctness to guarantee safety, is necessary. In this paper, we propose a design approach to deploy safe-by-design embedded systems. To attain this goal, we rely on a small core of verified software to handle faults in applications and RTOS and recover from them while ensuring that timing constraints of safety-critical tasks are always satisfied. Faults are detected by monitoring the application timing and fault-recovery is achieved via full platform restart and software reload, enabled by the short restart time of embedded systems. Schedulability analysis is used to ensure that the timing constraints of critical plant control tasks are always satisfied in spite of faults and consequent restarts. We derive schedulability results for four restart-tolerant task models. We use a simulator to evaluate and compare the performance of the considered scheduling models

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Combinators and bisimulation proofs for restartable systems

Author: Prasad K. V. S.
Publication venue: The University of Edinburgh
Publication date: 01/01/1987
Field of study

Edinburgh Research Archive

Improving quality of service in application clusters

Author: Corsava S.
Corsava S.
Getov Vladimir
Getov Vladimir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Quality of service (QoS) requirements, which include availability, integrity, performance and responsiveness are increasingly needed by science and engineering applications. Rising computational demands and data mining present a new challenge in the IT world. As our needs for more processing, research and analysis increase, performance and reliability degrade exponentially. In this paper we present a software system that manages quality of service for Unix based distributed application clusters. Our approach is synthetic and involves intelligent agents that make use of static and dynamic ontologies to monitor, diagnose and correct faults at run time, over a private network. Finally, we provide experimental results from our pilot implementation in a production environment

Crossref

WestminsterResearch

Recommended from our members

A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems

Author: Kaiser Gail E.
Phung Dan
Valetto Giuseppe
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations -- but usually employing an effector technology peculiar to one type of target system. While completely generic "one size fits all" effector technologies seem implausible, we propose a general purpose programming model and interaction layer that abstracts away from the peculiarities of target specific effectors,enabling a uniform approach to controlling and coordinating the low-level execution of reconfigurations, repairs,micro-reboots, etc

Columbia University Academic Commons

Recommended from our members

A Uniform Programming Abstraction for Effecting Autonomic Adaptations onto Software Systems

Author: Kaiser Gail E.
Valetto Giuseppe
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Most general-purpose work towards autonomic or self-managing systems has emphasized the front end of the feedback control loop, with some also concerned with controlling the back end enactment of runtime adaptations but usually employing an effector technology peculiar to one type of target system. While completely generic 'one size fits all' effector technologies seem implausible, we propose a general-purpose programming model and interaction layer that abstracts away from the peculiarities of target-specific effectors, enabling a uniform approach to controlling and coordinating the low-level execution of reconfigurations, repairs, micro-reboots, etc

Columbia University Academic Commons

An early implementation of revised Algol 68

Author: Schlichting J.J.F.M.
Publication venue
Publication date: 14/11/1989
Field of study

CWI's Institutional Repository

Autonomous Recovery in Componentized Internet Applications

Author: Candea George
Fox Armando
Kawamoto Shinichi
Kiciman Emre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/12/2006
Field of study

In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring application modifications. Our prototype combines three application-agnostic techniques: macroanalysis for fault detection and localization, microrebooting for rapid recovery, and external management of recovery actions. The individual techniques are autonomous and work across a wide range of componentized Internet applications, making them well-suited to the rapidly changing software of Internet services. The proposed framework has been integrated with JBoss, an open-source J2EE application server. Our prototype provides an execution platform that can automatically recover J2EE applications within seconds of the manifestation of a fault. Our system can provide a subset of a system's active end users with the illusion of continuous uptime, in spite of failures occurring behind the scenes, even when there is no functional redundancy in the system

Infoscience - École polytechnique fédérale de Lausanne

Behind the Last Line of Defense -- Surviving SoC Faults and Intrusions

Author: Esteves-Verissimo Paulo
Gouveia Inês Pinto
Völp Marcus
Publication venue
Publication date: 03/05/2020
Field of study

Today, leveraging the enormous modular power, diversity and flexibility of manycore systems-on-a-chip (SoCs) requires careful orchestration of complex resources, a task left to low-level software, e.g. hypervisors. In current architectures, this software forms a single point of failure and worthwhile target for attacks: once compromised, adversaries gain access to all information and full control over the platform and the environment it controls. This paper proposes Midir, an enhanced manycore architecture, effecting a paradigm shift from SoCs to distributed SoCs. Midir changes the way platform resources are controlled, by retrofitting tile-based fault containment through well known mechanisms, while securing low-overhead quorum-based consensus on all critical operations, in particular privilege management and, thus, management of containment domains. Allowing versatile redundancy management, Midir promotes resilience for all software levels, including at low level. We explain this architecture, its associated algorithms and hardware mechanisms and show, for the example of a Byzantine fault tolerant microhypervisor, that it outperforms the highly efficient MinBFT by one order of magnitude

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg