Search CORE

2,567 research outputs found

Resilience in Numerical Methods: A Position on Fault Models and Methodologies

Author: Elliott James
Hoemmen Mark
Mueller Frank
Publication venue
Publication date: 13/01/2014
Field of study

Future extreme-scale computer systems may expose silent data corruption (SDC) to applications, in order to save energy or increase performance. However, resilience research struggles to come up with useful abstract programming models for reasoning about SDC. Existing work randomly flips bits in running applications, but this only shows average-case behavior for a low-level, artificial hardware model. Algorithm developers need to understand worst-case behavior with the higher-level data types they actually use, in order to make their algorithms more resilient. Also, we know so little about how SDC may manifest in future hardware, that it seems premature to draw conclusions about the average case. We argue instead that numerical algorithms can benefit from a numerical unreliability fault model, where faults manifest as unbounded perturbations to floating-point data. Algorithms can use inexpensive "sanity" checks that bound or exclude error in the results of computations. Given a selective reliability programming model that requires reliability only when and where needed, such checks can make algorithms reliable despite unbounded faults. Sanity checks, and in general a healthy skepticism about the correctness of subroutines, are wise even if hardware is perfectly reliable.Comment: Position Pape

arXiv.org e-Print Archive

CiteSeerX

Optimised configuration of sensors for fault tolerant control of an electro-magnetic suspension system

Author: Anderson BDO
Blanke M
Deb K
Dreo J
Franklin GF
Friedland B
Friedland B
Goldberg DE
Kalman RE
Maciejowski JM
Michail K
Paddison JE
Patton RJ
Skogestad S
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2011
Field of study

For any given system the number and location of sensors can affect the closed-loop performance as well as the reliability of the system. Hence, one problem in control system design is the selection of the sensors in some optimum sense that considers both the system performance and reliability. Although some methods have been proposed that deal with some of the aforementioned aspects, in this work, a design framework dealing with both control and reliability aspects is presented. The proposed framework is able to identify the best sensor set for which optimum performance is achieved even under single or multiple sensor failures with minimum sensor redundancy. The proposed systematic framework combines linear quadratic Gaussian control, fault tolerant control and multiobjective optimisation. The efficacy of the proposed framework is shown via appropriate simulations on an electro-magnetic suspension system

University of Lincoln Institutional Repository

Crossref

Loughborough University Institutional Repository

Sussex Research Online

University of Huddersfield Repository

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

Author: Khan Shahbaz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/05/2017
Field of study

Depth first search (DFS) tree is a fundamental data structure for solving graph problems. The classical algorithm [SiComp74] for building a DFS tree requires

O(m+n)

time for a given graph

G

having

n

vertices and

m

edges. Recently, Baswana et al. [SODA16] presented a simple algorithm for updating DFS tree of an undirected graph after an edge/vertex update in

\tilde{O}(n)

time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds, that can be adopted easily to the parallel environment. In the parallel model, a DFS tree can be computed from scratch using

m

processors in expected

\tilde{O}(1)

time [SiComp90] on an EREW PRAM, whereas the best deterministic algorithm takes

\tilde{O}(\sqrt{n})

time [SiComp90,JAlg93] on a CRCW PRAM. Our algorithm can be used to develop optimal (upto polylog n factors deterministic algorithms for maintaining fully dynamic DFS and fault tolerant DFS, of an undirected graph. 1- Parallel Fully Dynamic DFS: Given an arbitrary online sequence of vertex/edge updates, we can maintain a DFS tree of an undirected graph in

\tilde{O}(1)

time per update using

m

processors on an EREW PRAM. 2- Parallel Fault tolerant DFS: An undirected graph can be preprocessed to build a data structure of size O(m) such that for a set of

k

updates (where

k

is constant) in the graph, the updated DFS tree can be computed in

\tilde{O}(1)

time using

n

processors on an EREW PRAM. Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree in semi-streaming model and a restricted distributed model. These are the first parallel, semi-streaming and distributed algorithms for maintaining a DFS tree in the dynamic setting.Comment: Accepted to appear in SPAA'17, 32 Pages, 5 Figure

arXiv.org e-Print Archive

Crossref

Kompics: a message-passing component model for building distributed systems

Author: Arad Cosmin
Haridi Seif
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2010
Field of study

The Kompics component model and programming framework was designedto simplify the development of increasingly complex distributed systems. Systems built with Kompics leverage multi-core machines out of the box and they can be dynamically reconfigured to support hot software upgrades. A simulation framework enables deterministic debugging and reproducible performance evaluation of unmodified Kompics distributed systems. We describe the component model and show how to program and compose event-based distributed systems. We present the architectural patterns and abstractions that Kompics facilitates and we highlight a case study of a complex distributed middleware that we have built with Kompics. We show how our approach enables systematic development and evaluation of large-scale and dynamic distributed systems

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Recommended from our members

Securing state reconstruction under sensor and actuator attacks: Theory and design

Author: Diggavi SN
Shoukry Y
Showkatbakhsh M
Tabuada P
Publication venue: eScholarship, University of California
Publication date: 01/06/2020
Field of study

This paper discusses the problem of reconstructing the state of a linear time invariant system when some of its actuators and sensors are compromised by an adversarial agent. In the model considered in this paper, the adversarial agent attacks an input (output) by manipulating its value arbitrarily, i.e., we impose no constraints (statistical or otherwise) on how control commands (sensor measurements) are changed by the adversary other than a bound on the number of attacked actuators and sensors In the first part of this paper, we introduce the notion of sparse strong observability and we show that is a necessary and sufficient condition for correctly reconstructing the state despite the considered attacks. In the second half of this work, we propose an observer to harness the complexity of this intrinsically combinatorial problem, by leveraging satisfiability modulo theory solving. Numerical simulations illustrate the effectiveness and scalability of our observer

eScholarship - University of California

Self-stabilizing sorting algorithms

Author: Chacko Joseph
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/1995
Field of study

A distributed system consists of a set of machines which do not share a global memory. Depending on the connectivity of the network, each machine gets a partial view of the global state. Transient failures in one area of the network may go unnoticed in other areas and may cause the system to go to an illegal global state. However, if the system were self-stabilizing, it would be guaranteed that regardless of the current state, the system would recover to a legal configuration in a finite number of moves; The traditional way of creating reliable systems is to make redundant components. Self-stabilization allows systems to be fault tolerant through software as well. This is an evolving paradigm in the design of robust distributed systems. The ability to recover spontaneously from an arbitrary state makes self-stabilizing systems immune to transient failures or perturbations in the system state such as changes in network topology; This thesis presents an O(nh) fault-tolerant distributed sorting algorithm for a tree network, where n is the number of nodes in the system, and h is the height of the tree. Fault-tolerance is achieved using Dijkstra\u27s paradigm of self-stabilization which is a method of non-masking fault-tolerance embedding the fault-tolerance within the algorithm. Varghese\u27s counter flushing method is used in order to achieve synchronization among processes in the system. In the distributed sorting problem each node is given a value and an id which are non-corruptible. The idea is to have each node take a specific value based on its id. The algorithm handles transient faults by weeding out false information in the system. Nodes can start with completely false information concerning the values and ids of the system yet the intended behavior is still achieved. Also, nodes are allowed to crash and re-enter the system later as well as allowing new nodes to enter the system

University of Nevada, Las Vegas Repository