Search CORE

10 research outputs found

Keep Net Working - On a Dependable and Fast Networking Stack

Author: Bos H.J.
Dirk V.
Hruby T.
Tanenbaum A. S.
Publication venue: IEEE
Publication date: 01/01/2012
Field of study

Optimizing decomposition of software architecture for local recovery

Author: Hasan Sözer
Bedir Tekinerdoğan
Mehmet Akşit
C. Alexander
T. Athon
A. Avizienis
F. Buschmann
G. Candea
P. Clements
P. Clements
L. Dobrica
S. Gokhale
Y. Huang
G. Hunt
T. Jokiaho
C. H. Lung
N. Medvidovic
I. Meedeniya
B. S. Mitchell
N. Nethercote
V. Pareto
S. Ross
H. Sozer
T. Teitelbaum
B. Tekinerdogan
J. White
S. Yacoub
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Cataloged from PDF version of article.The increasing size and complexity of software systems has led to an amplified number of potential failures and as such makes it harder to ensure software reliability. Since it is usually hard to prevent all the failures, fault tolerance techniques have become more important. An essential element of fault tolerance is the recovery from failures. Local recovery is an effective approach whereby only the erroneous parts of the system are recovered while the other parts remain available. For achieving local recovery, the architecture needs to be decomposed into separate units that can be recovered in isolation. Usually, there are many different alternative ways to decompose the system into recoverable units. It appears that each of these decomposition alternatives performs differently with respect to availability and performance metrics. We propose a systematic approach dedicated to optimizing the decomposition of software architecture for local recovery. The approach provides systematic guidelines to depict the design space of the possible decomposition alternatives, to reduce the design space with respect to domain and stakeholder constraints and to balance the feasible alternatives with respect to availability and performance. The approach is supported by an integrated set of tools and illustrated for the open-source MPlayer software

Crossref

Bilkent University Institutional Repository

eResearch@Ozyegin

Fine-grained fault tolerance using device checkpoints

Author: Asim Kadav
Bailey K.
Boyd-Wickizer S.
Brumley D.
C. Clark
Chun B.
Corbet J.
David F. M.
Erlingsson Úlfar
Fraser K.
Larus J. R.
Matthew J. Renzelmann
Membrane S. Sundararaman
Michael M. Swift
Paxson V.
Ramachandran P.
SafeDrive F. Zhou
Swift M. M.
Williams D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Flora: a framework for decomposing software architecture to introduce local recovery

Author: Akşit M.
Sözer H.
Tekinerdoǧan B.
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

The decomposition of software architecture into modular units is usually driven by the required quality concerns. In this paper we focus on the impact of local recovery concern on the decomposition of the software system. For achieving local recovery, the system needs to be decomposed into separate units that can be recovered in isolation. However, it appears that this required decomposition for recovery is usually not aligned with the decomposition based on functional concerns. Moreover, introducing local recovery to a software system, while preserving the existing decomposition, is not trivial and requires substantial development and maintenance effort. To reduce this effort we propose a framework that supports the decomposition and implementation of software architecture for local recovery. The framework provides reusable abstractions for defining recoverable units and the necessary coordination and communication protocols for recovery. We discuss our experiences in the application and evaluation of the framework for introducing local recovery to the open-source media player called MPlayer. Copyright 2009 John Wiley & Sons, Ltd

Bilkent University Institutional Repository

Optimizing decomposition of software architecture for local recovery

Author: Akşit M.
Sözer H.
Tekinerdoǧan B.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The increasing size and complexity of software systems has led to an amplified number of potential failures and as such makes it harder to ensure software reliability. Since it is usually hard to prevent all the failures, fault tolerance techniques have become more important. An essential element of fault tolerance is the recovery from failures. Local recovery is an effective approach whereby only the erroneous parts of the system are recovered while the other parts remain available. For achieving local recovery, the architecture needs to be decomposed into separate units that can be recovered in isolation. Usually, there are many different alternative ways to decompose the system into recoverable units. It appears that each of these decomposition alternatives performs differently with respect to availability and performance metrics. We propose a systematic approach dedicated to optimizing the decomposition of software architecture for local recovery. The approach provides systematic guidelines to depict the design space of the possible decomposition alternatives, to reduce the design space with respect to domain and stakeholder constraints and to balance the feasible alternatives with respect to availability and performance. The approach is supported by an integrated set of tools and illustrated for the open-source MPlayer software. © 2011 Springer Science+Business Media, LLC

Bilkent University Institutional Repository

Contract Testing for Reliable Embedded Systems

Author: Schmidlin Fajardo Silva Raul
Publication venue
Publication date: 01/01/2013
Field of study

Embedded systems comprise diverse technologies complicating their design. By creating virtual prototypes of the target system, Electronic System Level Design, the early analysis of a system composed by electronics and software is possible. However, the concrete interaction between hardware modules and between hardware and software is left for late development stages and real prototype making. Generally, interaction between components is assumed to be correct. However, it has to be assumed on development implicitly because interaction between components is not considered in the functionality design. While single components are mostly thoroughly tested and guarantee certain reliability levels, their interaction is based on often underspecified interfaces. Although component usage is mostly specified, operational constraints are often left out. Finally, not only the interaction between components but also with the environment and the user are not ensured. Generally, only functional integration tests are executed and corner-cases are left out, leaving uncovered faults that only manifest as failures later when their cost is higher. Therefore, this work aims at component interaction through specification of interfaces, test generation and real-time test execution. The specification is based on the design-by-contract approach of software that specifies semantics of component interaction in addition to the syntactical definition through functions. In the first part of this work, a specification for the interaction between hardware modules is given. With the automatic real-time test execution, fulfillment of specified preconditions for correct component operation can be checked. In component-based design, the component is trusted and thus, its functionality is assumed to be correct when certain postconditions are specified. In a correct component assembly, component postconditions fulfill preconditions of other components resulting in an operational system. The specification of preconditions follows the definition of environmental properties, acceptable input sequences for interfacing pins, as well as acceptable signal parameters, such as voltage levels, slope times, delays and glitches. Postconditions are defined by the description of a functionality accompanying constraints, such as timing. These parameters are automatically determined on operation by a testing circuit. Parameters that violate the specification are signaled by the testing circuit and failure is detected. The chosen parameters can give hint of the reason for the failure being an evidence of a circuit fault. In the example of an Inter-Integrated Circuit (I2C) communication system, we define contracts and show comparisons between contract violation, fault categorization and failure occurrence under signal fault injection. To complete this work, support for fault analysis on the electronic system level design is given. For this, the data transfers between the high-level models used in the design are augmented with the defined contract parameters. With a specific interface, digital faults are generated for transactions with violating signal parameters that can be tracked by the system. This way, recovery mechanisms for synchronous communication are proposed and tested. In the second part, the interaction between hardware and software is tackled providing special methods for developing device drivers. For this, we do not only specify the interface between hardware and software but also map the hardware control elements to software, partially generating the software interface for a device. This is necessary because drivers handle devices with internal control elements like registers, data streams and interrupts that cannot be represented on software. This systematic composition of drivers facilitates the development of a device interface called the device mechanism. It is the lowest layer of a two-layer architecture for driver development. The device mechanism carries out the access to the device exporting a pure software interface. This interface is based on the device implementation being, thus, fully specified. Further data processing required for compliance with the operating system or application is carried out in the driver policy, the layer on top of it. With the definition of a software layer for device control, contracts specifying constraints of this interface are proposed. These contracts are based on implementation constraints of the device and on its dynamic behavior. Therefore, an extended finite state machine models the dynamic behavior of the device. Based on it, functions of the device mechanism can be augmented with preconditions on the state or on state machine variables. These conditions are then checked on runtime. After execution of a function, its postconditions are ensured, such as timing. This guarantees that different driver policies, operating systems or firmwares, use this same device mechanism fulfilling its constraints. On the example of a Philips webcam, we develop the complete driver for Linux based on our architecture, creating contracts for its device mechanism. Following the systematic composition and the contract approach, driver bugs are avoided that otherwise violate allowed values for device data and execution orders of device protocols

Heidelberger Dokumentenserver

Safe and automatic live update

Author: Giuffrida C.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/2014
Field of study

Tanenbaum, A.S. [Promotor

CiteSeerX

VU Research Portal

Abstract Failure Resilience for Device Drivers

Author: Andrew S. Tanenbaum
Ben Gras
Herbert Bos
Jorrit N. Herder
Philip Homburg
Publication venue
Publication date
Field of study

Studies have shown that device drivers and extensions contain 3–7 times more bugs than other code and thus are more likely to fail. Therefore, we present a failure-resilient operating system that can recover from dead device drivers and other critical components—primarily through monitoring and replacing malfunctioning components on the fly—transparent to applications and without user intervention. This paper focuses on the post-mortem recovery procedure. We explain the working of our defect detection mechanism, the policy-driven recovery procedure, and post-restart reintegration of components. Furthermore, we discuss the concrete steps taken to recover from network, block device, and character device driver failures. Finally, we evaluate our recovery mechanism using performance measurements, software fault-injection, and an analysis of the reengineering effort

CiteSeerX