16,206 research outputs found

    GoTcha: An Interactive Debugger for GoT-Based Distributed Systems

    Full text link
    Debugging distributed systems is hard. Most of the techniques that have been developed for debugging such systems use either extensive model checking, or postmortem analysis of logs and traces. Interactive debugging is typically a tool that is only effective in single threaded and single process applications, and is rarely applied to distributed systems. While the live observation of state changes using interactive debuggers is effective, it comes with a host of problems in distributed scenarios. In this paper, we discuss the requirements an interactive debugger for distributed systems should meet, the role the underlying distributed model plays in facilitating the debugger, and the implementation of our interactive debugger: GoTcha. GoTcha is a browser based interactive debugger for distributed systems built on the Global Object Tracker (GoT) programming model. We show how the GoT model facilitates the debugger, and the features that the debugger can offer. We also demonstrate a typical debugging workflow

    Detailed Diagnosis of Performance Anomalies in Sensornets

    Get PDF
    We address the problem of analysing performance anomalies in sensor networks. In this paper, we propose an approach that uses the local flash storage of the motes for logging system data, in combination with online statistical analysis. Our results show not only that this is a feasible method but that the overhead is significantly lower than that of communication-centric methods, and that interesting patterns can be revealed when calculating the correlation of large data sets of separate event types.GINSENGCONE

    On-stack replacement, distilled

    Get PDF
    On-stack replacement (OSR) is essential technology for adaptive optimization, allowing changes to code actively executing in a managed runtime. The engineering aspects of OSR are well-known among VM architects, with several implementations available to date. However, OSR is yet to be explored as a general means to transfer execution between related program versions, which can pave the road to unprecedented applications that stretch beyond VMs. We aim at filling this gap with a constructive and provably correct OSR framework, allowing a class of general-purpose transformation functions to yield a special-purpose replacement. We describe and evaluate an implementation of our technique in LLVM. As a novel application of OSR, we present a feasibility study on debugging of optimized code, showing how our techniques can be used to fix variables holding incorrect values at breakpoints due to optimizations

    BurstProbe: Debugging Time-Critical Data Delivery in Wireless Sensor Networks

    Get PDF
    In this paper we present BurstProbe, a new technique to accurately measure link burstiness in a wireless sensor network employed for time-critical data delivery. Measurement relies on shared probing slots that are embedded in the transmission schedule and used by nodes to assess link burstiness over time. The acquired link burstiness information can be stored in the node's flash memory and relied upon to diagnose transmission problems when missed deadlines occur. Thus, accurate diagnosis is achieved in a distributed manner and without the overhead of transmitting rich measurement data to a central collection point. For the purpose of evaluation we have implemented BurstProbe in the GinMAC WSN protocol and we are able to demonstrate it is an accurate tool to debug time-critical data delivery. In addition, we analyze the cost of implementingBurstProbe and investigate its effectiveness

    Uncovering Bugs in Distributed Storage Systems during Testing (not in Production!)

    Get PDF
    Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors. Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging
    • …
    corecore