Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical
  Image Segmentation

Chen, Mingjin; He, Yongkang; Lu, Yongyi; Yang, Zhijing

Data-Centric Diet: Effective Multi-center Dataset Pruning for Medical Image Segmentation

Authors: Mingjin Chen
Yongkang He
Yongyi Lu
Zhijing Yang
Publication date: 2 August 2023
Publisher

Abstract

This paper seeks to address the dense labeling problems where a significant fraction of the dataset can be pruned without sacrificing much accuracy. We observe that, on standard medical image segmentation benchmarks, the loss gradient norm-based metrics of individual training examples applied in image classification fail to identify the important samples. To address this issue, we propose a data pruning method by taking into consideration the training dynamics on target regions using Dynamic Average Dice (DAD) score. To the best of our knowledge, we are among the first to address the data importance in dense labeling tasks in the field of medical image analysis, making the following contributions: (1) investigating the underlying causes with rigorous empirical analysis, and (2) determining effective data pruning approach in dense labeling problems. Our solution can be used as a strong yet simple baseline to select important examples for medical image segmentation with combined data sources.Comment: Accepted by ICML workshops 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.01189

Last time updated on 06/08/2023