Search CORE

5,713 research outputs found

ISIS and META projects

Author: Birman Kenneth
Cooper Robert
Marzullo Keith
Publication venue
Publication date
Field of study

The ISIS project has developed a new methodology, virtual synchony, for writing robust distributed software. High performance multicast, large scale applications, and wide area networks are the focus of interest. Several interesting applications that exploit the strengths of ISIS, including an NFS-compatible replicated file system, are being developed. The META project is distributed control in a soft real-time environment incorporating feedback. This domain encompasses examples as diverse as monitoring inventory and consumption on a factory floor, and performing load-balancing on a distributed computing system. One of the first uses of META is for distributed application management: the tasks of configuring a distributed program, dynamically adapting to failures, and monitoring its performance. Recent progress and current plans are reported

NASA Technical Reports Server

Programming with process groups: Group and multicast semantics

Author: Birman Kenneth P.
Cooper Robert
Gleeson Barry
Publication venue
Publication date: 29/01/1991
Field of study

Process groups are a natural tool for distributed programming and are increasingly important in distributed computing environments. Discussed here is a new architecture that arose from an effort to simplify Isis process group semantics. The findings include a refined notion of how the clients of a group should be treated, what the properties of a multicast primitive should be when systems contain large numbers of overlapping groups, and a new construct called the causality domain. A system based on this architecture is now being implemented in collaboration with the Chorus and Mach projects

NASA Technical Reports Server

eCommons@Cornell

The ISIS project: Fault-tolerance in large distributed systems

Author: Birman Kenneth P.
Marzullo Keith
Publication venue
Publication date
Field of study

The semi-annual status report covers activities of the ISIS project during the second half of 1989. The project had several independent objectives: (1) At the level of the ISIS Toolkit, ISIS release V2.0 was completed, containing bypass communication protocols. Performance of the system is greatly enhanced by this change, but the initial software release is limited in some respects. (2) The Meta project focused on the definition of the Lomita programming language for specifying rules that monitor sensors for conditions of interest and triggering appropriate reactions. This design was completed, and implementation of Lomita is underway on the Meta 2.0 platform. (3) The Deceit file system effort completed a prototype. It is planned to make Deceit available for use in two hospital information systems. (4) A long-haul communication subsystem project was completed and can be used as part of ISIS. This effort resulted in tools for linking ISIS systems on different LANs together over long-haul communications lines. (5) Magic Lantern, a graphical tool for building application monitoring and control interfaces, is included as part of the general ISIS releases

NASA Technical Reports Server

Time constrained fault tolerance and management framework for k-connected distributed wireless sensor networks based on composite event detection

Author: Verma Anupam
Publication venue
Publication date: 16/05/2011
Field of study

Wireless sensor nodes themselves are exceptionally complex systems where a variety of components interact in a complex way. In enterprise scenarios it becomes highly important to hide the details of the underlying sensor networks from the applications and to guarantee a minimum level of reliability of the system. One of the challenges faced to achieve this level of reliability is to overcome the failures frequently faced by sensor networks due to their tight integration with the environment. Failures can generate false information, which may trigger incorrect business processes, resulting in additional costs. Sensor networks are inherently fault prone due to the shared wireless communication medium. Thus, sensor nodes can lose synchrony and their programs can reach arbitrary states. Since on-site maintenance is not feasible, sensor network applications should be local and communication-efficient self-healing. Also, as per my knowledge, no such general framework exist that addresses all the fault issues one may encounter in a WSN, based on the extensive, exhaustive and comprehensive literature survey in the related areas of research. As one of the main goals of enterprise applications is to reduce the costs of business processes, a complete and more general Fault Tolerance and management framework for a general WSN, irrespective of the node types and deployment conditions is proposed which would help to mitigate the propagation of failures in a business environment, reduce the installation and maintenance costs and to gain deployment flexibility to allow for unobtrusive installation

ethesis@nitr

Formal and Fault Tolerant Design

Author: Aljer Ammar
Devienne Philippe
Publication venue: HAL CCSD
Publication date: 05/07/2012
Field of study

Software quality and reliability were verified for a long time at the post-implementation level (test, fault sce-nario ...). The design of embedded systems and digital circuits is more and more complex because of inte-gration density, heterogeneity. Now almost ¾ of the digital circuits contain at least one processor, that is, can execute software code. In other words, co-design is the most usual case and traditional verification by simu-lation is no more practical. Moreover, the increase in integration density comes with a decrease in the reliabil-ity of the components. So fault detection, diagnostics techniques, introspection are essential for defect toler-ance, fault tolerance and self repair of safety-critical systems. The use of a formal specification language is considered as the foundation of a real validation. What we would like to emphasize is that refinement (from an abstract model to the point where the system will be implemented) could be and should be formal too in order to ensure the traceability of requirements, to man-age such development projects and so to design fault-tolerant systems correct by proven construction. Such a thorough approach can be achieved by the automation or semi-automation of the refinement process. We have studied how to ensure the traceability of these requirements in a component-based approach. Re-liability, fault tolerance can be seen here as particular refinement steps. For instance, a given formal specifi-cation of a system/component may be refined by adding redundancy (data, computation, component) and be verified to be fault-tolerant w.r.t. some given fault scenarios. A self-repair component can be defined as the refinement of its original form enhanced with error detection. We describe in this paper the PCSI project (Zero Defect Systems) based on B Method, VHDL and PSL. The three modeling approaches can collaborate together and guarantee the codesign of embedded systems for which the requirements and the fault-tolerant aspects are taken into account for the beginning and formally verified all along the implementation process

HAL - Lille 3

INRIA a CCSD electronic archive server

Behavior Trees in Robotics and AI: An Introduction

Author: Colledanchise Michele
Ögren Petter
Publication venue: 'Informa UK Limited'
Publication date: 03/06/2020
Field of study

A Behavior Tree (BT) is a way to structure the switching between different tasks in an autonomous agent, such as a robot or a virtual entity in a computer game. BTs are a very efficient way of creating complex systems that are both modular and reactive. These properties are crucial in many applications, which has led to the spread of BT from computer game programming to many branches of AI and Robotics. In this book, we will first give an introduction to BTs, then we describe how BTs relate to, and in many cases generalize, earlier switching structures. These ideas are then used as a foundation for a set of efficient and easy to use design principles. Properties such as safety, robustness, and efficiency are important for an autonomous system, and we describe a set of tools for formally analyzing these using a state space description of BTs. With the new analysis tools, we can formalize the descriptions of how BTs generalize earlier approaches. We also show the use of BTs in automated planning and machine learning. Finally, we describe an extended set of tools to capture the behavior of Stochastic BTs, where the outcomes of actions are described by probabilities. These tools enable the computation of both success probabilities and time to completion

arXiv.org e-Print Archive

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX