Skip to main content
Article thumbnail
Location of Repository

Finding missing synchronization in a distributed computation using controlled re-execution

By Neeraj Mittal and Vijay K. Garg

Abstract

Correct distributed programs are hard to write. Not surprisingly, distributed systems are especially vulnerable to software faults. Testing and debugging is an important way to improve the reliability of distributed systems.A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem of predicate control, which takes a computation and a safety property specified on the computation as inputs, and produces a controlled computation, with added synchronization, that maintains the given safety property as output. We devise efficient control algorithms for two classes of useful predicates, namely region predicates and disjunctive predicates. For the former, we prove that the control algorithm is optimal in the sense that it guarantees maximum concurrency possible in the controlled computation. For the latter, we prove that our control algorithm generates the least number of synchronization dependencies and therefore has optimal message-complexity. Furthermore, we provide a necessary and sufficient condition under which it is possible to efficiently compute a minimal controlling synchronization for a general predicate. We also give an algorithm to compute such a synchronization under the condition provided

Year: 2004
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.7004
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://users.ece.utexas.edu/~g... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.