1,006 research outputs found
Efficient and Deterministic Record & Replay for Actor Languages
With the ubiquity of parallel commodity hardware, developers turn to high-level concurrency models such as the actor model to lower the complexity of concurrent software. However, debugging concurrent software is hard, especially for concurrency models with a limited set of supporting tools. Such tools often deal only with the underlying threads and locks, which obscures the view on e.g. actors and messages and thereby introduces additional complexity. To improve on this situation, we present a low-overhead record & replay approach for actor languages. It allows one to debug concurrency issues deterministically based on a previously recorded trace. Our evaluation shows that the average run-time overhead for tracing on benchmarks from the Savina suite is 10% (min. 0%, max. 20%). For Acme-Air, a modern web application, we see a maximum increase of 1% in latency for HTTP requests and about 1.4 MB/s of trace data. These results are a first step towards deterministic replay debugging of actor systems in production
Simplifying Contract-Violating Traces
Contract conformance is hard to determine statically, prior to the deployment
of large pieces of software. A scalable alternative is to monitor for contract
violations post-deployment: once a violation is detected, the trace
characterising the offending execution is analysed to pinpoint the source of
the offence. A major drawback with this technique is that, often, contract
violations take time to surface, resulting in long traces that are hard to
analyse. This paper proposes a methodology together with an accompanying tool
for simplifying traces and assisting contract-violation debugging.Comment: In Proceedings FLACOS 2012, arXiv:1209.169
Optimizing Abstract Abstract Machines
The technique of abstracting abstract machines (AAM) provides a systematic
approach for deriving computable approximations of evaluators that are easily
proved sound. This article contributes a complementary step-by-step process for
subsequently going from a naive analyzer derived under the AAM approach, to an
efficient and correct implementation. The end result of the process is a two to
three order-of-magnitude improvement over the systematically derived analyzer,
making it competitive with hand-optimized implementations that compute
fundamentally less precise results.Comment: Proceedings of the International Conference on Functional Programming
2013 (ICFP 2013). Boston, Massachusetts. September, 201
Causal-Consistent Replay Debugging for Message Passing Programs
Debugging of concurrent systems is a tedious and error-prone activity. A main issue is that there is no guarantee that a bug that appears in the original computation is replayed inside the debugger. This problem is usually tackled by so-called replay debugging, which allows the user to record a program execution and replay it inside the debugger. In this paper, we present a novel technique for replay debugging that we call controlled causal-consistent replay. Controlled causal-consistent replay allows the user to record a program execution and, in contrast to traditional replay debuggers, to reproduce a visible misbehavior inside the debugger including all and only its causes. In this way, the user is not distracted by the actions of other, unrelated processes
Capturing High-level Nondeterminism in Concurrent Programs for Practical Concurrency Model Agnostic Record and Replay
With concurrency being integral to most software systems, developers combine high-level concurrency models in the same application to tackle each problem with appropriate abstractions. While languages and libraries offer a wide range of concurrency models, debugging support for applications that combine them has not yet gained much attention. Record & replay aids debugging by deterministically reproducing recorded bugs, but is typically designed for a single concurrency model only. This paper proposes a practical concurrency-model-agnostic record & replay approach for multi-paradigm concurrent programs, i.e. applications that combine concurrency models. Our approach traces high-level non- deterministic events by using a uniform model-agnostic trace format and infrastructure. This enables ordering- based record & replay support for a wide range of concurrency models, and thereby enables debugging of applications that combine them. In addition, it allows language implementors to add new concurrency mod- els and reuse the model-agnostic record & replay support. We argue that a concurrency-model-agnostic record & replay is practical and enables advanced debugging support for a wide range of concurrency models. The evaluation shows that our approach is expressive and flexible enough to support record & replay of applications using threads & locks, communicating event loops, communicating sequential processes, software transactional memory and combinations of those concurrency models. For the actor model, we reach recording performance competitive with an optimized special-purpose record & replay solution. The average recording overhead on the Savina actor benchmark suite is 10% (min. 0%, max. 23%). The performance for other concurrency models and combinations thereof is at a similar level. We believe our concurrency-model-agnostic approach helps developers of applications that mix and match concurrency models. We hope that this substrate inspires new tools and languages making building and maintaining of multi-paradigm concurrent applications simpler and safer
Safety-Critical Learning of Robot Control with Temporal Logic Specifications
Reinforcement learning (RL) is a promising approach. However, success is
limited to real-world applications, because ensuring safe exploration and
facilitating adequate exploitation is a challenge for controlling robotic
systems with unknown models and measurement uncertainties. The learning problem
becomes even more difficult for complex tasks over continuous state-action. In
this paper, we propose a learning-based robotic control framework consisting of
several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex
tasks over infinite horizons that are translated to a novel automaton
structure; (2) we detail an innovative reward scheme for LTL satisfaction with
a probabilistic guarantee. Then, by applying a reward shaping technique, we
develop a modular policy-gradient architecture exploiting the benefits of the
automaton structure to decompose overall tasks and enhance the performance of
learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate
the uncertain dynamic systems, we synthesize a model-based safe exploration
during the learning process using Exponential Control Barrier Functions (ECBFs)
that generalize systems with high-order relative degrees; (4) to further
improve the efficiency of exploration, we utilize the properties of LTL
automata and ECBFs to propose a safe guiding process. Finally, we demonstrate
the effectiveness of the framework via several robotic environments. We show an
ECBF-based modular deep RL algorithm that achieves near-perfect success rates
and safety guarding with high probability confidence during training.Comment: Under Review. arXiv admin note: text overlap with arXiv:2102.1285
Asynchronous Snapshots of Actor Systems for Latency-Sensitive Applications
The actor model is popular for many types of server applications. Efficient snapshotting of applications is crucial in the deployment of pre-initialized applications or moving running applications to different machines, e.g for debugging purposes. A key issue is that snapshotting blocks all other operations. In modern latency-sensitive applications, stopping the application to persist its state needs to be avoided, because users may not tolerate the increased request latency. In order to minimize the impact of snapshotting on request latency, our approach persists the application’s state asynchronously by capturing partial heaps, completing snapshots step by step. Additionally, our solution is transparent and supports arbitrary object graphs. We prototyped our snapshotting approach on top of the Truffle/Graal platform and evaluated it with the Savina benchmarks and the Acme Air microservice application. When performing a snapshot every thousand Acme Air requests, the number of slow requests ( 0.007% of all requests) with latency above 100ms increases by 5.43%. Our Savina microbenchmark results detail how different utilization patterns impact snapshotting cost. To the best of our knowledge, this is the first system that enables asynchronous snapshotting of actor applications, i.e. without stop-the-world synchronization, and thereby minimizes the impact on latency. We thus believe it enables new deployment and debugging options for actor systems
- …