2 research outputs found

    Online Trace Reordering for Efficient Representation of Event Partial Orders

    Get PDF
    Distributed and parallel applications not only have distributed state but are often inherently non-deterministic, making them significantly more challenging to monitor and debug. Additionally, a significant challenge when working with distributed and parallel applications has to do with the fundamental requirement of determining the order in which certain actions are performed by the application. A naive approach for ordering actions would be to impose a single order on all actions, i.e., given any two actions or events, one must happen before the other. A global order, however, is often misleading, e.g., two events in two different processes may be causally independent yet one may have occurred before the other. A partial order of events, therefore, serves as the fundamental data structure for ordering events in distributed and parallel applications. Traditionally, Fidge/Mattern timestamps have been used for representing event partial orders. The size of the vector timestamp depends on the number of parallel entities (traces) in the application, e.g., processes or threads. A major limitation of Fidge/Mattern time- stamps is that the total size of timestamps does not scale for large systems with hundreds or thousands of traces. Taylor proposed an efficient offset-based scheme for representing large event partial orders by representing deltas between timestamps of successive events. The offset-based schemes have been shown to be significantly more space efficient when traces that communicate the most are close to each other for generating the deltas (offsets). In Taylor’s offset-based schemes the optimal order of traces is computed offline. In this work we adapt the offset-based schemes to dynamically reorder traces and demonstrate that very efficient scalable representations of event partial orders can be generated in an online setting, requiring as few as 100 bytes/event for storing partial order event data for applications with around 1000 processes

    Efficient Vector Time with Dynamic Process Creation and Termination

    No full text
    Many distributed algorithms require knowledge of the causal relationships between events. Examples include optimistic recovery protocols, distributed debugging systems, and causal distributed shared memory. Determining causal relationships can be difficult, however, because there is no global clock and local clocks cannot be perfectly synchronized. Vector time is a useful abstraction for capturing the causal relationships between events and, unlike Lamport's logical clocks, allows identification of concurrent events. Some drawbacks of vector time include transmission and logging overhead, since the size of a vector clock is linear in the number of processes. This paper presents a technique to reduce these overheads for applications that dynamically create and destroy processes and log event information with attached vector timestamps. The reduction in logging overhead comes at the expense of a more complicated timestamp comparison protocol and more sophisticated data structures for mai..
    corecore