23 research outputs found
A Delta Debugger for ILP Query Execution
Because query execution is the most crucial part of Inductive Logic
Programming (ILP) algorithms, a lot of effort is invested in developing faster
execution mechanisms. These execution mechanisms typically have a low-level
implementation, making them hard to debug. Moreover, other factors such as the
complexity of the problems handled by ILP algorithms and size of the code base
of ILP data mining systems make debugging at this level a very difficult job.
In this work, we present the trace-based debugging approach currently used in
the development of new execution mechanisms in hipP, the engine underlying the
ACE Data Mining system. This debugger uses the delta debugging algorithm to
automatically reduce the total time needed to expose bugs in ILP execution,
thus making manual debugging step much lighter.Comment: Paper presented at the 16th Workshop on Logic-based Methods in
Programming Environments (WLPE2006
Provenance, Incremental Evaluation, and Debugging in Datalog
The Datalog programming language has recently found increasing traction in research and industry. Driven by its clean declarative semantics, along with its conciseness and ease of use, Datalog has been adopted for a wide range of important applications, such as program analysis, graph problems, and networking. To enable this adoption, modern Datalog engines have implemented advanced language features and high-performance evaluation of Datalog programs. Unfortunately, critical infrastructure and tooling to support Datalog users and developers are still missing. For example, there are only limited tools addressing the crucial debugging problem, where developers can spend up to 30% of their time finding and fixing bugs.
This thesis addresses Datalog’s tooling gaps, with the ultimate goal of improving the productivity of Datalog programmers. The first contribution is centered around the critical problem of debugging: we develop a new debugging approach that explains the execution steps taken to produce a faulty output. Crucially, our debugging method can be applied for large-scale applications without substantially sacrificing performance. The second contribution addresses the problem of incremental evaluation, which is necessary when program inputs change slightly, and results need to be recomputed. Incremental evaluation allows this recomputation to happen more efficiently, without discarding the previous results and recomputing from scratch. Finally, the last contribution provides a new incremental debugging approach that identifies the root causes of faulty outputs that occur after an incremental evaluation. Incremental debugging focuses on the relationship between input and output and can provide debugging suggestions to amend the inputs so that faults no longer occur. These techniques, in combination, form a corpus of critical infrastructure and tooling developments for Datalog, allowing developers and users to use Datalog more productively
Recommended from our members
Complaint Driven Training Data Debugging for Machine Learning Workflows
As the need for machine learning (ML) increases rapidly across all industry sectors, so has theinterest in building ML platforms that manage and automate parts of the ML life-cycle. This has enabled companies to use ML inference as a part of their downstream analytics or their applications. Unfortunately, debugging unexpected outcomes in the result of these ML workflows remains a necessary but difficult task of the ML life-cycle. The challenge of debugging ML workflows is that it requires reasoning about the correctness of the workflow logic, the datasets used for inference and training, the models, and interactions between them. Even if the workflow logic is correct, errors in the data used across the ML workflow can still lead to wrong outcomes. In short, developers are not just debugging the code, but also the data.
We advocate in favor of a complaint driven approach towards specifying and debugging data errors in ML workflows. The approach takes as input user specified complaints specified as constraints over the final or intermediate outputs of workflows that use trained ML models. The approach outputs explanations in the form of specific operator(s) or data subsets, and how they may be changed to address the constraint violations.
In this thesis we make the first steps towards our complaint driven approach to data debugging. As a stepping stone, we focus our attention on complaints specified on top of relational workflows that use ML model inference and whose errors are caused by errors in ML model’s training data. To the best of our knowledge, we contribute the first debugging system for this task, which we call Rain. In response to a user complaint, Rain ranks the ML model’s training examples based on their ability to address the user’s complaint if they were removed. Our experiments show that users can use Rain to debug training data errors by specifying complaints over aggregations of model predictions without having to specify the correct label for each individual prediction.
Unfortunately, Rain’s latency may be prohibitive for use in interactive applications like analytical dashboards or business intelligence tools where users are likely to observe errors and complain. To address Rain’s latency problem when scaling to large ML models and training sets, we propose Rain++. Rain++ pushes the majority of Rain’s computation offline ahead of user interaction, achieving orders of magnitude online latency improvements compared to Rain.
To go beyond Rain’s and Rain++’s approach that evaluates individual training example deletionsindependently we propose MetaRain, a framework for training classifiers that detect training data corruptions in response to user complaints. Thanks to the generality of MetaRain, users can adapt the classifiers chosen to the training corruptions and the complaints they seek to resolve. Our experiments indicate that making use of this ability results in improved debugging outcomes.
Last but not least, we study the problem of updating relational workflow results in response tochanges to the inference ML model used. This can be leveraged by current or future complaint driven debugging systems that repeatedly change the model and reevaluate the relational workflow. We propose FaDE, a compiler that generates efficient code for the workflow update problem by casting it as view maintenance under input tuple deletions. Our experiments indicate that the code generated by FaDE has orders of magnitude lower latency than existing view maintenance systems
Fundamental Approaches to Software Engineering
This open access book constitutes the proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, FASE 2020, which took place in Dublin, Ireland, in April 2020, and was held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020. The 23 full papers, 1 tool paper and 6 testing competition papers presented in this volume were carefully reviewed and selected from 81 submissions. The papers cover topics such as requirements engineering, software architectures, specification, software quality, validation, verification of functional and non-functional properties, model-driven development and model transformation, software processes, security and software evolution
Combining SOA and BPM Technologies for Cross-System Process Automation
This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation
Towards lightweight, low-latency network function virtualisation at the network edge
Communication networks are witnessing a dramatic growth in the number of connected mobile devices, sensors and the Internet of Everything (IoE) equipment, which have been estimated to exceed 50 billion by 2020, generating zettabytes of traffic each year. In addition, networks are stressed to serve the increased capabilities of the mobile devices (e.g., HD cameras) and to fulfil the users' desire for always-on, multimedia-oriented, and low-latency connectivity.
To cope with these challenges, service providers are exploiting softwarised, cost-effective, and flexible service provisioning, known as Network Function Virtualisation (NFV). At the same time, future networks are aiming to push services to the edge of the network, to close physical proximity from the users, which has the potential to reduce end-to-end latency, while increasing the flexibility and agility of allocating resources. However, the heavy footprint of today's NFV platforms and their lack of dynamic, latency-optimal orchestration prevents them from being used at the edge of the network.
In this thesis, the opportunities of bringing NFV to the network edge are identified. As a concrete solution, the thesis presents Glasgow Network Functions (GNF), a container-based NFV framework that allocates and dynamically orchestrates lightweight virtual network functions (vNFs) at the edge of the network, providing low-latency network services (e.g., security functions or content caches) to users. The thesis presents a powerful formalisation for the latency-optimal placement of edge vNFs and provides an exact solution using Integer Linear Programming, along with a placement scheduler that relies on Optimal Stopping Theory to efficiently re-calculate the placement following roaming users and temporal changes in latency characteristics.
The results of this work demonstrate that GNF's real-world vNF examples can be created and hosted on a variety of hosting devices, including VMs from public clouds and low-cost edge devices typically found at the customer's premises. The results also show that GNF can carefully manage the placement of vNFs to provide low-latency guarantees, while minimising the number of vNF migrations required by the operators to keep the placement latency-optimal
Embedded System Design
A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues