51 research outputs found

    Programming Languages for Distributed Computing Systems

    Get PDF
    When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few library procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on message passing, rendezvous, remote procedure call, objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    The shared data-object model as a paradigm for programming distributed systems

    Get PDF

    Universality in algorithmic self-assembly

    Get PDF
    Tile-based self-assembly is a model of algorithmic crystal growth in which square tiles represent molecules that bind to each other via specific and variable-strength bonds on their four sides, driven by random mixing in solution but constrained by the local binding rules of the tile bonds. In the late 1990s, Erik Winfree introduced a discrete mathematical model of DNA tile assembly called the abstract Tile Assembly Mode. Winfree proved that the Tile Assembly Model is computationally universal, i.e., that any Turing machine can be encoded into a finite set of tile types whose self-assembly simulates that Turing machine. In this thesis, we investigate tile-based self-assembly systems that exhibit Turing universality, geometric universality and intrinsic universality. We first establish a novel characterization of the computably enumerable languages in terms of self-assembly--the proof of which is a novel proof of the Turing-universality of the Tile Assembly Model in which a particular Turing machine is simulated on all inputs in parallel in the two-dimensional discrete Euclidean plane. Then we prove that the multiple temperature tile assembly model (introduced by Aggarwal, Cheng, Goldwasser, Kao, and Schweller) exhibits a kind of geometric universality in the sense that there is a small (constant-size) universal tile set that can be programmed via deliberate changes in the system temperature to uniquely produce any finite shape. Just as other models of computation such as the Turing machine and cellular automaton are known to be intrinsically universal, (e.g., Turing machines can simulate other Turing machines, and cellular automata other cellular automata), we show that tile assembly systems satisfying a natural condition known as local consistency are able to simulate other locally consistent tile assembly systems. In other words, we exhibit a particular locally consistent tile assembly system that can simulate the behavior--as opposed to only the final result--of any other locally consistent tile assembly system. Finally, we consider the notion of universal fault-tolerance in algorithmic self-assembly with respect to the two-handed Tile Assembly Model, in which large aggregations of tiles may attach to each other, in contrast to the seeded Tile Assembly Model, in which tiles aggregate one at a time to a single specially-designated seed assembly. We introduce a new model of fault-tolerance in self-assembly: the fuzzy temperature model of faults having the following informal characterization: the system temperature is normally 2, but may drift down to 1, allowing unintended temperature-1 growth for an arbitrary period of time. Our main construction, which is a tile set capable of uniquely producing an n×nn \times n square with log n unique tile types in the fuzzy temperature model, is not universal but presents novel technique that we hope will ultimately pave the way for a universal fuzzy-fault-tolerant tile assembly system in the future

    Self-management for large-scale distributed systems

    Get PDF
    Autonomic computing aims at making computing systems self-managing by using autonomic managers in order to reduce obstacles caused by management complexity. This thesis presents results of research on self-management for large-scale distributed systems. This research was motivated by the increasing complexity of computing systems and their management. In the first part, we present our platform, called Niche, for programming self-managing component-based distributed applications. In our work on Niche, we have faced and addressed the following four challenges in achieving self-management in a dynamic environment characterized by volatile resources and high churn: resource discovery, robust and efficient sensing and actuation, management bottleneck, and scale. We present results of our research on addressing the above challenges. Niche implements the autonomic computing architecture, proposed by IBM, in a fully decentralized way. Niche supports a network-transparent view of the system architecture simplifying the design of distributed self-management. Niche provides a concise and expressive API for self-management. The implementation of the platform relies on the scalability and robustness of structured overlay networks. We proceed by presenting a methodology for designing the management part of a distributed self-managing application. We define design steps that include partitioning of management functions and orchestration of multiple autonomic managers. In the second part, we discuss robustness of management and data consistency, which are necessary in a distributed system. Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of Robust Management Elements, which are able to heal themselves under continuous churn. Our approach is based on replicating a management element using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. For data consistency, we propose a majority-based distributed key-value store supporting multiple consistency levels that is based on a peer-to-peer network. The store enables the tradeoff between high availability and data consistency. Using majority allows avoiding potential drawbacks of a master-based consistency control, namely, a single-point of failure and a potential performance bottleneck. In the third part, we investigate self-management for Cloud-based storage systems with the focus on elasticity control using elements of control theory and machine learning. We have conducted research on a number of different designs of an elasticity controller, including a State-Space feedback controller and a controller that combines feedback and feedforward control. We describe our experience in designing an elasticity controller for a Cloud-based key-value store using state-space model that enables to trade-off performance for cost. We describe the steps in designing an elasticity controller. We continue by presenting the design and evaluation of ElastMan, an elasticity controller for Cloud-based elastic key-value stores that combines feedforward and feedback control

    Modeling and verifying the FlexRay physical layer protocol with reachability checking of timed automata

    Get PDF
    In this thesis, I report on the verification of the resilience of the FlexRay automotive bus protocol's physical layer protocol against glitches during message transmission and drifting clocks. This entailed modeling a significant part of this industrially used communictation protocol and the underlying hardware as well as the possible error scenarios in fine detail. Verifying such a complex model with model-checking led me to the development of data-structures and algorithms able to handle the associated complexity using only reasonable resources. This thesis presents such data-structures and algorithms for reachability checking of timed automata. It also present modeling principles enabling the construction of timed automata models that can be efficiently checked, as well as the models arrived at. Finally, it reports on the verified resilience of FlexRay's physical layer protocol against specific patterns of glitches under varying assumptions about the underlying hardware, like clock drift.In dieser Dissertation berichte ich über den Nachweis der Resilienz des Bitübertragungsprotokolls für die physikalische Schicht des FlexRay-Fahrzeugbusprotokolls gegenüber Übertragungsfehlern und Uhrenverschiebung. Dafür wurde es notwendig, einen signifikanten Teil dieses industriell genutzten Kommunikationsprotokolls mit seiner Hardwareumgebung und die möglichen Fehlerszenarien detailliert zu modellieren. Ein so komplexes Modell mittels Modellprüfung zu überprüfen führte mich zur Entwicklung von Datenstrukturen und Algorithmen, die die damit verbundene Komplexität mit vernünftigen Ressourcenanforderungen bewältigen können. Diese Dissertation stellt solche Datenstrukturen und Algorithmen zur Erreichbarkeitsprüfung gezeiteter Automaten vor. Sie stellt auch Modellierungsprinzipien vor, die es ermöglichen, Modelle in Form gezeiteter Automaten zu konstruieren, die effizient überprüft werden können, sowie die erstellten Modelle. Schließlich berichtet sie über die überprüfte Resilienz des FlexRay-Bitübertragungsprotokolls gegenüber spezifischen Übertragungsfehlermustern unter verschiedenen Annahmen über die Hardwareumgebung, wie etwa die Uhrenverschiebung.DFG: SFB/TRR 14 "AVACS - Automatische Verifikation und Analyse komplexer Systeme

    Software redundancy: what, where, how

    Get PDF
    Software systems have become pervasive in everyday life and are the core component of many crucial activities. An inadequate level of reliability may determine the commercial failure of a software product. Still, despite the commitment and the rigorous verification processes employed by developers, software is deployed with faults. To increase the reliability of software systems, researchers have investigated the use of various form of redundancy. Informally, a software system is redundant when it performs the same functionality through the execution of different elements. Redundancy has been extensively exploited in many software engineering techniques, for example for fault-tolerance and reliability engineering, and in self-adaptive and self- healing programs. Despite the many uses, though, there is no formalization or study of software redundancy to support a proper and effective design of software. Our intuition is that a systematic and formal investigation of software redundancy will lead to more, and more effective uses of redundancy. This thesis develops this intuition and proposes a set of ways to characterize qualitatively as well as quantitatively redundancy. We first formalize the intuitive notion of redundancy whereby two code fragments are considered redundant when they perform the same functionality through different executions. On the basis of this abstract and general notion, we then develop a practical method to obtain a measure of software redundancy. We prove the effectiveness of our measure by showing that it distinguishes between shallow differences, where apparently different code fragments reduce to the same underlying code, and deep code differences, where the algorithmic nature of the computations differs. We also demonstrate that our measure is useful for developers, since it is a good predictor of the effectiveness of techniques that exploit redundancy. Besides formalizing the notion of redundancy, we investigate the pervasiveness of redundancy intrinsically found in modern software systems. Intrinsic redundancy is a form of redundancy that occurs as a by-product of modern design and development practices. We have observed that intrinsic redundancy is indeed present in software systems, and that it can be successfully exploited for good purposes. This thesis proposes a technique to automatically identify equivalent method sequences in software systems to help developers assess the presence of intrinsic redundancy. We demonstrate the effectiveness of the technique by showing that it identifies the majority of equivalent method sequences in a system with good precision and performance

    The embedded operating system project

    Get PDF
    This progress report describes research towards the design and construction of embedded operating systems for real-time advanced aerospace applications. The applications concerned require reliable operating system support that must accommodate networks of computers. The report addresses the problems of constructing such operating systems, the communications media, reconfiguration, consistency and recovery in a distributed system, and the issues of realtime processing. A discussion is included on suitable theoretical foundations for the use of atomic actions to support fault tolerance and data consistency in real-time object-based systems. In particular, this report addresses: atomic actions, fault tolerance, operating system structure, program development, reliability and availability, and networking issues. This document reports the status of various experiments designed and conducted to investigate embedded operating system design issues

    Performance Analysis of Recoverable Flight Control Systems Subject to Neutron-Induced Upsets Using Hybrid Dynamical Models

    Get PDF
    It has been observed that atmospheric neutrons can produce single-event upsets in digital flight control hardware. Potentially, they can reduce system performance and introduce a safety hazard. One experimental system-level approach investigated to help mitigate the effects of these upsets is NASA Langley\u27s Recoverable Computer System. It employs rollback error recovery using dual-lock-step processors together with new fault tolerant architectures and communication subsystems. In this dissertation, a class of stochastic hybrid dynamical models, which consists of a jump-linear system and a stochastic finite-state automaton, is used to describe the performance of a Boeing 737 aircraft system in closed-loop with a Recoverable Computer System. The jump-linear system models the switched dynamics of the closed-loop system due to the presence of controller recoveries. Each dynamical model in the jump-linear system was obtained separately using system identification techniques and high fidelity flight simulation software. The stochastic finite-state automaton approximates the recovery logic of the Recoverable Computer System. The upsets process is modeled by either an independent, identically distributed process or a first-order Markov chain. Mean-square stability and output tracking performance of the recoverable flight control system are analyzed theoretically via a model-equivalent Markov jump-linear system of the stochastic hybrid model. The model was validated using data from a controlled experiment at NASA Langley, where simulated neutron-induced upsets were injected into the system at a desired rate. The effects on the output tracking performance of a simulated aircraft were then directly observed and quantified. The model was then used to analyze a neutron-based experiment on the Recoverable Computer System at the Los Alamos National Laboratory. This model predicts that the experimental flight control system, when functioning as designed, will provide robust control performance in the presence of neutron-induced single-event upsets at normal atmospheric levels
    corecore