2 research outputs found

    Efficient Pattern Search in Large, Partial-Order Data Sets

    Get PDF
    The behaviour of a large, distributed system is inherently complex. One step towards making this behaviour more understandable to a user involves instrumenting the system and collecting data about its execution. We can model the data as traces (representing various sequential entities in the system such as single-threaded processes) that contain both events local to the trace and communication events involving another trace. Visualizing this data provides a modest benefit to users as it makes basic interactions in the system clearer and, with some user effort, more complex interactions can be determined. Unfortunately, visualization by itself is not an adequate solution, especially for large numbers of events and complex interactions among traces. A search facility has the ability to make this event data more useful. Work has been done previously on various frameworks and algorithms that could form the core of such a search facility; however, various shortcomings in the completeness of the frameworks and in the efficiency of the algorithms resulted in an inconsistent, incomplete, and inefficient solution. This thesis takes steps to remedy this situation. We propose a provably-complete framework for determining precedence between sets of events and propose additions to a previous pattern-specification language so it can specify a wider variety of search patterns. We improve the efficiency of the existing search algorithm, and provide a new, more efficient, algorithm that processes a pattern in a fundamentally different way. Furthermore, the various proposed improvements have been implemented and are analysed empirically

    Online Monitoring of Distributed Systems Using Causal Event Patterns

    Get PDF
    Event monitoring and logging, that is, recording the communication events between processes, is a critical component in many highly reliable distributed systems. The event logs enable the identification of certain safety-condition violations, such as race conditions and mutual-exclusion violations, as safety is generally contingent on a specific causally ordered pattern of process communication. Previous efforts at finding such patterns have often focused on offline techniques, which are unable to identify operational problems as they occur. Online monitoring tools exist but they are often restricted to identifying a specific violation condition, such as a deadlock or a race condition, using dedicated data structures. We address the more general problem of detecting causally related event patterns that can be used to identify various undesired behaviours in the system. The main challenge for online pattern matching is the need to store the partial matches to the pattern, as they may combine with future events to form a complete match. Unlike pattern matching in most other domains, causally ordered patterns can span a potentially unbounded number of events and efficiently searching through this large collection poses a significant challenge. We present an efficient online causal-event-pattern-matching framework that bounds the number of partial matches it stores by reporting only a representative subset of pattern matches. We define a subset of matches as representative if it has at least one occurrence of each event in the pattern on each process, which is applicable for a large class of distributed applications. Our first pattern-matching algorithm, OCEP introduces a backtracking algorithm to efficiently find a representative subset from the history of events. An evaluation of the framework shows that OCEP is capable of handling several frequently occurring violation patterns at the event rates of some representative distributed applications. Our second algorithm, Ananke, introduces causality-based rules in the search pattern that can be used to specify the removal of an event from the maintained history. We used some of the most frequently occurring types of concurrency bugs in real-world applications to show that the desired causal order of events can be utilized to specify such removal rules. More importantly, these rules are able to maintain a finite history and still report a representative set of matches within a millisecond in most cases
    corecore