5 research outputs found
Out-Of-Place debugging: a debugging architecture to reduce debugging interference
Context. Recent studies show that developers spend most of their programming
time testing, verifying and debugging software. As applications become more and
more complex, developers demand more advanced debugging support to ease the
software development process.
Inquiry. Since the 70's many debugging solutions were introduced. Amongst
them, online debuggers provide a good insight on the conditions that led to a
bug, allowing inspection and interaction with the variables of the program.
However, most of the online debugging solutions introduce \textit{debugging
interference} to the execution of the program, i.e. pauses, latency, and
evaluation of code containing side-effects.
Approach. This paper investigates a novel debugging technique called
\outofplace debugging. The goal is to minimize the debugging interference
characteristic of online debugging while allowing online remote capabilities.
An \outofplace debugger transfers the program execution and application state
from the debugged application to the debugger application, both running in
different processes.
Knowledge. On the one hand, \outofplace debugging allows developers to debug
applications remotely, overcoming the need of physical access to the machine
where the debugged application is running. On the other hand, debugging happens
locally on the remote machine avoiding latency. That makes it suitable to be
deployed on a distributed system and handle the debugging of several processes
running in parallel.
Grounding. We implemented a concrete out-of-place debugger for the Pharo
Smalltalk programming language. We show that our approach is practical by
performing several benchmarks, comparing our approach with a classic remote
online debugger. We show that our prototype debugger outperforms by a 1000
times a traditional remote debugger in several scenarios. Moreover, we show
that the presence of our debugger does not impact the overall performance of an
application.
Importance. This work combines remote debugging with the debugging experience
of a local online debugger. Out-of-place debugging is the first online
debugging technique that can minimize debugging interference while debugging a
remote application. Yet, it still keeps the benefits of online debugging ( e.g.
step-by-step execution). This makes the technique suitable for modern
applications which are increasingly parallel, distributed and reactive to
streams of data from various sources like sensors, UI, network, etc
A debugging approach for live Big Data applications
International audienceMany frameworks exist for programmers to develop and deploy Big Data applications such as Hadoop Map/Reduce and Apache Spark. However, very little debugging support is currently provided in those frameworks. When an error occurs, developers are lost in trying to understand what has happened from the information provided in log files. Recently, new solutions allow developers to record & replay the application execution, but replaying is not always affordable when hours of computation need to be re-executed. In this paper, we present an online approach that allows developers to debug Big Data applications in isolation by moving the debugging session to an external process when a halting point is reached. We introduce IDRA MR , our prototype implementation in Pharo. IDRA MR centralizes the debugging of parallel applications by introducing novel debugging concepts, such as composite debugging events, and the ability to dynamically update both the code of the debugged application and the same configuration of the running framework. We validate our approach by debugging both application and configuration failures for two driving scenarios. The scenarios are implemented and executed using Port, our Map/Reduce framework for Pharo, also introduced in this paper
Recommended from our members
Automated Testing and Debugging for Big Data Analytics
The prevalence of big data analytics in almost every large-scale software system has generated a substantial push to build data-intensive scalable computing (DISC) frameworks such as Google MapReduce and Apache Spark that can fully harness the power of existing data centers. However, frameworks once used by domain experts are now being leveraged by data scientists, business analysts, and researchers. This shift in user demographics calls for immediate advancements in the development, debugging, and testing practices of big data applications, which are falling behind compared to the DISC framework design and implementation. In practice, big data applications often fail as users are unable to test all behaviors emerging from interleaving dataflow operators, user-defined functions, and framework's code. "Testing based on a random sample" rarely guarantees the reliability and "trial and error" and "print" debugging methods are expensive and time-consuming. Thus, the current practice of developing a big data application must be improved and the tools built to enhance the developer's productivity must adapt to the distinct characteristics of data-intensive scalable computing. By synthesizing ideas from software engineering and database systems, our hypothesis is that we can design effective and scalable testing and debugging algorithms for big data analytics without compromising the performance and efficiency of the underlying DISC framework. To design such techniques, we investigate how we can build interactive and responsive debugging primitives that significantly reduce the debugging time, yet do not pose much performance overhead on big data applications. Furthermore, we investigate how we can leverage data provenance techniques from databases and fault-isolation algorithms from software engineering to pinpoint the minimal subset of failure-inducing inputs efficiently. To improve the reliability of big data analytics, we investigate how we can abstract the semantics of dataflow operators and use them in tandem with the semantics of user-defined functions to generate a minimum set of synthetic test inputs capable of revealing more defects than the entire input dataset.To examine the first hypothesis, we introduce interactive, real-time debugging primitives for big data analytics through innovative and scalable debugging features such as simulated breakpoint, dynamic watchpoint, and crash culprit identification. Second, we design a new automated fault localization approach that combines insights from both the software engineering and database literature to bring delta debugging closer to a reality in the big data applications by leveraging data provenance and by constructing systems optimizations for debugging provenance queries. Lastly, we devise a new symbolic-execution based white-box testing algorithm for big data applications that abstracts the implementation of dataflow operators using logical specifications instead of modeling their implementations and combines them with the semantics of any arbitrary user-defined function. We instantiate the idea of an interactive debugging algorithm as BigDebug, the idea of an automated debugging algorithm as BigSift, and the idea of symbolic execution-based testing as BigTest. Our investigation shows that the interactive debugging primitives can scale to terabytes---our record-level tracing incurs less than 25% overhead on average and provides up to 100% time saving compared to the baseline replay debugger. Second, we observe that by combining data provenance with delta debugging, we can identify the minimum faulty input in just under 30% of the original job execution time. Lastly, we verify that by abstracting dataflow operators using logical specifications, we can efficiently generate the most concise test data suitable for local testing while revealing twice as many faults as prior approaches. Our investigations collectively demonstrate that developer productivity can be significantly improved through effective and scalable testing and debugging techniques for big data analytics, without impacting the DISC framework's performance. This dissertation affirms the feasibility of automated debugging and testing techniques for big data analytics---techniques that were previously considered infeasible for large-scale data processing
Recommended from our members
Scalable Systems for Large Scale Dynamic Connected Data Processing
As the proliferation of sensors rapidly make the Internet-of-Things (IoT) a reality, the devices and sensors in this ecosystem—such as smartphones, video cameras, home automation systems, and autonomous vehicles—constantly map out the real-world producing unprecedented amounts of dynamic, connected data that captures complex and diverse relations. Unfortunately, existing big data processing and machine learning frameworks are ill-suited for analyzing such dynamic connected data and face several challenges when employed for this purpose.This dissertation focuses on the design and implementation of scalable systems for dynamic connected data processing. We discuss simple abstractions that make it easy to operate on such data, efficient data structures for state management, and computation models that reduce redundant work. We also describe how bridging theory and practice with algorithms and techniques that leverage approximation and streaming theory can significantly speed up connected data computations. The systems described in this dissertation achieve more than an order of magnitude improvement over the state-of-the-art