Search CORE

291 research outputs found

Letter from the Special Issue Editor

Author: Schelter S.
Publication venue
Publication date: 01/03/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Letter from the Special Issue Editor

Author: Schelter S.
Publication venue
Publication date: 01/03/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Technical Perspective: Query Optimization for Faster Deep CNN Explanations

Author: Schelter S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2020
Field of study

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Letter from the Special Issue Editor

Author: Schelter S.
Publication venue
Publication date: 01/03/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Learnings from a Retail Recommendation System on Billions of Interactions at bol.com

Author: Kersbergen B.
Schelter S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Learnings from a Retail Recommendation System on Billions of Interactions at bol.com

Author: Kersbergen B.
Schelter S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Recommender systems are ubiquitous in the modern internet, where they help users find items they might like. We discuss the design of a large-scale recommender system handling billions of interactions on a European e-commerce platform.We present two studies on enhancing the predictive performance of this system with both algorithmic and systems-related approaches. First, we evaluate neural network-based approaches on proprietary data from our e-commerce platform, and confirm recent results outlining that the benefits of these methods with respect to predictive performance are limited, while they exhibit severe scalability bottlenecks. Next, we investigate the impact of a reduction of the response latency of our serving system, and conduct an A/B test on the live platform with more than 19 million user sessions, which confirms that the latency reduction of the recommender system correlates with a significant increase in business-relevant metrics. We discuss the implications of our findings with respect to real world recommendation systems and future research on scalable session-based recommendation

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Taming Technical Bias in Machine Learning Pipelines

Author: Schelter S.
Stoyanovich J.
Publication venue
Publication date: 01/12/2020
Field of study

Machine Learning (ML) is commonly used to automate decisions in domains as varied as credit and lending, medical diagnosis, and hiring. These decisions are consequential, imploring us to carefully balance the benefits of efficiency with the potential risks. Much of the conversation about the risks centers around bias — a term that is used by the technical community ever more frequently but that is still poorly understood. In this paper we focus on technical bias — a type of bias that has so far received limited attention and that the data engineering community is well-equipped to address. We discuss dimensions of technical bias that can arise through the ML lifecycle, particularly when it’s due to preprocessing decisions or post-deployment issues. We present results of our recent work, and discuss future research directions. Our over-all goal is to support the development of systems that expose the knobs of responsibility to data scientists, allowing them to detect instances of technical bias and to mitigate it when possible

International Migration, Integration and Social Cohesion online publications

UvA-DARE

HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning

Author: Dunning T.
Grafberger S.
Schelter S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications

Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines

Author: Grafberger S.
Groth P.
Schelter S.
Publication venue
Publication date: 01/06/2023
Field of study

Software systems that learn from data with machine learning (ML) are used in critical decision-making processes. Unfortunately, real-world experience shows that the pipelines for data preparation, feature encoding and model training in ML systems are often brittle with respect to their input data. As a consequence, data scientists have to run different kinds of data centric what-if analyses to evaluate the robustness and reliability of such pipelines, e.g., with respect to data errors or preprocessing techniques. These what-if analyses follow a common pattern: they take an existing ML pipeline, create a pipeline variant by introducing a small change, and execute this pipeline variant to see how the change impacts the pipeline's output score. The application of existing analysis techniques to ML pipelines is technically challenging as they are hard to integrate into existing pipeline code and their execution introduces large overheads due to repeated work.We propose mlwhatif to address these integration and efficiency challenges for data-centric what-if analyses on ML pipelines. mlwhatif enables data scientists to declaratively specify what-if analyses for an ML pipeline, and to automatically generate, optimize and execute the required pipeline variants. Our approach employs pipeline patches to specify changes to the data, operators and models of a pipeline. Based on these patches, we define a multi-query optimizer for efficiently executing the resulting pipeline variants jointly, with four subsumption-based optimization rules. Subsequently, we detail how to implement the pipeline variant generation and optimizer of mlwhatif. For that, we instrument native ML pipelines written in Python to extract dataflow plans with re-executable operators.We experimentally evaluate mlwhatif, and find that its speedup scales linearly with the number of pipeline variants in applicable cases, and is invariant to the input data size. In end-to-end experiments with four analyses on more than 60 pipelines, we show speedups of up to 13x compared to sequential execution, and find that the speedup is invariant to the model and featurization in the pipeline. Furthermore, we confirm the low instrumentation overhead of mlwhatif

International Migration, Integration and Social Cohesion online publications

UvA-DARE

HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning

Author: Dunning T.
Grafberger S.
Schelter S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

International Migration, Integration and Social Cohesion online publications