24 research outputs found
Recommended from our members
Supporting fault-tolerant parallel programming in Linda
Linda is a language for programming parallel applications whose most notable feature is a distributed shared memory called tuple space. While suitable for a wide variety of programs, one shortcoming of the language as commonly defined and implemented is a lack of support for writing programs that can tolerate failures in the underlying computing platform. This paper describes FT-Linda, a version of Linda that addresses this problem by providing two major enhancements that facilitate the writing of fault-tolerant applications: stable tuple spaces and atomic execution of tuple space operations. The former is a type of stable storage in which tuple values are guaranteed to persist across failures, while the latter allows collections of tuple operations to be executed in an all-or-nothing fashion despite failures and concurrency. The design of these enhancements is presented in detail and illustrated by examples drawn from both the Linda and fault-tolerance domains. An implementation of FT-Linda for a network of workstations is also described. The design is based on replicating the contents of stable tuple spaces to provide failure resilience and then updating the copies using atomic multicast. This strategy allows an efficient implementation in which only a single multicast message is needed for each atomic collection of tuple space operations.<
Recommended from our members
Designing the Next Generation of Real-Time Control, Communication, and Computations for Large Power Systems
The power grid is not only a network interconnecting generators and loads through a transmission and distribution system, but is overlaid with a communication and control system that enables economic and secure operation. This multilayered infrastructure has evolved over many decades utilizing new technologies as they have appeared. This evolution has been slow and incremental, as the operation of the power system consisting of vertically integrated utilities has, until recently, changed very little. The monitoring of the grid is still done by a hierarchical design with polling for data at scanning rates in seconds that reflects the conceptual design of the 1960s. This design was adequate for vertically integrated utilities with limited feedback and wide-area controls; however, the thesis of this paper is that the changing environment, in both policy and technology, requires a new look at the operation of the power grid and a complete redesign of the control, communication and computation infrastructure. We provide several example novel control and communication regimes for such a new infrastructure
Leveraging the next-generation power grid:Data sharing and associated partnerships
Data delivery in the power grid today is, for the most part, hard-coded, tedious to implement and change, and does not provide any real end-to-end guarantees. Applications have started to emerge that require real-time data delivery in order to provide a wide-area assessment of the health of the power grid. This paper presents two novel communication infrastructures that facilitate the delivery of power data to intended recipients, each based on a different communication paradigm. The necessity of forming and managing trusted partnerships in either framework is further discussed
A Communication Framework for Fault-tolerant Parallel Execution
Abstract. PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availability. A communicating parallel program must employ checkpoint-restart and/or process redundancy to make continuous forward progress in such an unreliable environment. A communication model based on one-sided Put/Get calls, pioneered by the Linda system, is a good match as processes can execute their communication operations independently and asynchronously. However, Linda and its many variants are not designed for communicating processes that are replicated or independently restarted from checkpoints. The key problem is that a single logical operation that impacts the global program state may be executed by different instances of the same process at different times leading to semantic inconsistency. This paper presents the design, execution model, implementation, and validation of a communication layer for robust execution on volatile nodes. The research leads to a practical way to employ idle PCs for latency tolerant parallel computing applications.