2 research outputs found
DMLR: Data-centric Machine Learning Research -- Past, Present and Future
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and
meetings prior, in this report we outline the relevance of community engagement
and infrastructure development for the creation of next-generation public
datasets that will advance machine learning science. We chart a path forward as
a collective effort to sustain the creation and maintenance of these datasets
and methods towards positive scientific, societal and business impact.Comment: This editorial report accompanies the inaugural Data-centric Machine
Learning Research (DMLR) Workshop that took place at ICML 2023
https://dmlr.ai
DataPerf: Benchmarks for Data-Centric AI Development
Machine learning research has long focused on models rather than datasets,
and prominent datasets are used for common ML tasks without regard to the
breadth, difficulty, and faithfulness of the underlying problems. Neglecting
the fundamental importance of data has given rise to inaccuracy, bias, and
fragility in real-world applications, and research is hindered by saturation
across existing dataset benchmarks. In response, we present DataPerf, a
community-led benchmark suite for evaluating ML datasets and data-centric
algorithms. We aim to foster innovation in data-centric AI through competition,
comparability, and reproducibility. We enable the ML community to iterate on
datasets, instead of just architectures, and we provide an open, online
platform with multiple rounds of challenges to support this iterative
development. The first iteration of DataPerf contains five benchmarks covering
a wide spectrum of data-centric techniques, tasks, and modalities in vision,
speech, acquisition, debugging, and diffusion prompting, and we support hosting
new contributed benchmarks from the community. The benchmarks, online
evaluation platform, and baseline implementations are open source, and the
MLCommons Association will maintain DataPerf to ensure long-term benefits to
academia and industry.Comment: NeurIPS 2023 Datasets and Benchmarks Trac