81 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Recommended from our members
Complaint Driven Training Data Debugging for Machine Learning Workflows
As the need for machine learning (ML) increases rapidly across all industry sectors, so has theinterest in building ML platforms that manage and automate parts of the ML life-cycle. This has enabled companies to use ML inference as a part of their downstream analytics or their applications. Unfortunately, debugging unexpected outcomes in the result of these ML workflows remains a necessary but difficult task of the ML life-cycle. The challenge of debugging ML workflows is that it requires reasoning about the correctness of the workflow logic, the datasets used for inference and training, the models, and interactions between them. Even if the workflow logic is correct, errors in the data used across the ML workflow can still lead to wrong outcomes. In short, developers are not just debugging the code, but also the data.
We advocate in favor of a complaint driven approach towards specifying and debugging data errors in ML workflows. The approach takes as input user specified complaints specified as constraints over the final or intermediate outputs of workflows that use trained ML models. The approach outputs explanations in the form of specific operator(s) or data subsets, and how they may be changed to address the constraint violations.
In this thesis we make the first steps towards our complaint driven approach to data debugging. As a stepping stone, we focus our attention on complaints specified on top of relational workflows that use ML model inference and whose errors are caused by errors in ML model’s training data. To the best of our knowledge, we contribute the first debugging system for this task, which we call Rain. In response to a user complaint, Rain ranks the ML model’s training examples based on their ability to address the user’s complaint if they were removed. Our experiments show that users can use Rain to debug training data errors by specifying complaints over aggregations of model predictions without having to specify the correct label for each individual prediction.
Unfortunately, Rain’s latency may be prohibitive for use in interactive applications like analytical dashboards or business intelligence tools where users are likely to observe errors and complain. To address Rain’s latency problem when scaling to large ML models and training sets, we propose Rain++. Rain++ pushes the majority of Rain’s computation offline ahead of user interaction, achieving orders of magnitude online latency improvements compared to Rain.
To go beyond Rain’s and Rain++’s approach that evaluates individual training example deletionsindependently we propose MetaRain, a framework for training classifiers that detect training data corruptions in response to user complaints. Thanks to the generality of MetaRain, users can adapt the classifiers chosen to the training corruptions and the complaints they seek to resolve. Our experiments indicate that making use of this ability results in improved debugging outcomes.
Last but not least, we study the problem of updating relational workflow results in response tochanges to the inference ML model used. This can be leveraged by current or future complaint driven debugging systems that repeatedly change the model and reevaluate the relational workflow. We propose FaDE, a compiler that generates efficient code for the workflow update problem by casting it as view maintenance under input tuple deletions. Our experiments indicate that the code generated by FaDE has orders of magnitude lower latency than existing view maintenance systems
LIPIcs, Volume 258, SoCG 2023, Complete Volume
LIPIcs, Volume 258, SoCG 2023, Complete Volum
Bridging Dense and Sparse Maximum Inner Product Search
Maximum inner product search (MIPS) over dense and sparse vectors have
progressed independently in a bifurcated literature for decades; the latter is
better known as top- retrieval in Information Retrieval. This duality exists
because sparse and dense vectors serve different end goals. That is despite the
fact that they are manifestations of the same mathematical problem. In this
work, we ask if algorithms for dense vectors could be applied effectively to
sparse vectors, particularly those that violate the assumptions underlying
top- retrieval methods. We study IVF-based retrieval where vectors are
partitioned into clusters and only a fraction of clusters are searched during
retrieval. We conduct a comprehensive analysis of dimensionality reduction for
sparse vectors, and examine standard and spherical KMeans for partitioning. Our
experiments demonstrate that IVF serves as an efficient solution for sparse
MIPS. As byproducts, we identify two research opportunities and demonstrate
their potential. First, we cast the IVF paradigm as a dynamic pruning technique
and turn that insight into a novel organization of the inverted index for
approximate MIPS for general sparse vectors. Second, we offer a unified regime
for MIPS over vectors that have dense and sparse subspaces, and show its
robustness to query distributions
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries
This two-volume set LNCS 12962 and 12963 constitutes the thoroughly refereed proceedings of the 7th International MICCAI Brainlesion Workshop, BrainLes 2021, as well as the RSNA-ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge, the Federated Tumor Segmentation (FeTS) Challenge, the Cross-Modality Domain Adaptation (CrossMoDA) Challenge, and the challenge on Quantification of Uncertainties in Biomedical Image Quantification (QUBIQ). These were held jointly at the 23rd Medical Image Computing for Computer Assisted Intervention Conference, MICCAI 2020, in September 2021. The 91 revised papers presented in these volumes were selected form 151 submissions. Due to COVID-19 pandemic the conference was held virtually. This is an open access book
Cultures and Traditions of Wordplay and Wordplay Research
This volume focuses on realisations of wordplay in different cultures and social and historical contexts, and brings together various research traditions of approaching wordplay. Together with the volume DWP 7, it assembles selected papers presented at the interdisciplinary conference The Dynamics of Wordplay / La dynamique du jeu de mots (Trier, 2016) and stresses the inherent dynamicity of wordplay and wordplay research
Remote Sensing Data Compression
A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin
Tools and Algorithms for the Construction and Analysis of Systems
This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems
Head-Driven Phrase Structure Grammar
Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)
Automated Deduction – CADE 28
This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions
- …