13,226 research outputs found
Maintaining consistency in distributed systems
In systems designed as assemblies of independently developed components, concurrent access to data or data structures normally arises within individual programs, and is controlled using mutual exclusion constructs, such as semaphores and monitors. Where data is persistent and/or sets of operation are related to one another, transactions or linearizability may be more appropriate. Systems that incorporate cooperative styles of distributed execution often replicate or distribute data within groups of components. In these cases, group oriented consistency properties must be maintained, and tools based on the virtual synchrony execution model greatly simplify the task confronting an application developer. All three styles of distributed computing are likely to be seen in future systems - often, within the same application. This leads us to propose an integrated approach that permits applications that use virtual synchrony with concurrent objects that respect a linearizability constraint, and vice versa. Transactional subsystems are treated as a special case of linearizability
A Theory of Partitioned Global Address Spaces
Partitioned global address space (PGAS) is a parallel programming model for
the development of applications on clusters. It provides a global address space
partitioned among the cluster nodes, and is supported in programming languages
like C, C++, and Fortran by means of APIs. In this paper we provide a formal
model for the semantics of single instruction, multiple data programs using
PGAS APIs. Our model reflects the main features of popular real-world APIs such
as SHMEM, ARMCI, GASNet, GPI, and GASPI.
A key feature of PGAS is the support for one-sided communication: a node may
directly read and write the memory located at a remote node, without explicit
synchronization with the processes running on the remote side. One-sided
communication increases performance by decoupling process synchronization from
data transfer, but requires the programmer to reason about appropriate
synchronizations between reads and writes. As a second contribution, we propose
and investigate robustness, a criterion for correct synchronization of PGAS
programs. Robustness corresponds to acyclicity of a suitable happens-before
relation defined on PGAS computations. The requirement is finer than the
classical data race freedom and rules out most false error reports.
Our main result is an algorithm for checking robustness of PGAS programs. The
algorithm makes use of two insights. Using combinatorial arguments we first
show that, if a PGAS program is not robust, then there are computations in a
certain normal form that violate happens-before acyclicity. Intuitively,
normal-form computations delay remote accesses in an ordered way. We then
devise an algorithm that checks for cyclic normal-form computations.
Essentially, the algorithm is an emptiness check for a novel automaton model
that accepts normal-form computations in streaming fashion. Altogether, we
prove the robustness problem is PSpace-complete
Monitoring Partially Synchronous Distributed Systems using SMT Solvers
In this paper, we discuss the feasibility of monitoring partially synchronous
distributed systems to detect latent bugs, i.e., errors caused by concurrency
and race conditions among concurrent processes. We present a monitoring
framework where we model both system constraints and latent bugs as
Satisfiability Modulo Theories (SMT) formulas, and we detect the presence of
latent bugs using an SMT solver. We demonstrate the feasibility of our
framework using both synthetic applications where latent bugs occur at any time
with random probability and an application involving exclusive access to a
shared resource with a subtle timing bug. We illustrate how the time required
for verification is affected by parameters such as communication frequency,
latency, and clock skew. Our results show that our framework can be used for
real-life applications, and because our framework uses SMT solvers, the range
of appropriate applications will increase as these solvers become more
efficient over time.Comment: Technical Report corresponding to the paper accepted at Runtime
Verification (RV) 201
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
- …