Towards A Platform and Benchmark Suite for Model Training on Dynamic Datasets

Böther, Maximilian; Gsteiger, Viktor; Klimovic, Ana; Strati, Foteini

Towards A Platform and Benchmark Suite for Model Training on Dynamic Datasets

Authors: Maximilian Böther
Viktor Gsteiger
Ana Klimovic
Foteini Strati
Publication date: 1 May 2023
Publisher: Association for Computing Machinery
Doi

Abstract

Machine learning (ML) is often applied in use cases where training data evolves and/or grows over time. Training must incorporate data changes for high model quality, however this is often challenging and expensive due to large datasets and models. In contrast, ML researchers often train and evaluate ML models on static datasets or with artificial assumptions about data dynamics. This gap between research and practice is largely due to (i) the absence of an open-source platform that manages dynamic datasets at scale and supports pluggable policies for when and what data to train on, and (ii) the lack of representative open-source benchmarks for ML training on dynamic datasets. To address this gap, we propose to design a platform that enables ML researchers and practitioners to explore training and data selection policies, while alleviating the burdens of managing large dynamic datasets and orchestrating recurring training jobs. We also propose to build an accompanying benchmark suite that integrates public dynamic datasets and ML models from a variety of representative use cases

Similar works

Full text

Available Versions

Repository for Publications and Research Data

oai:www.research-collection.et...

Last time updated on 19/09/2023