34 research outputs found
One-Step Abductive Multi-Target Learning with Diverse Noisy Samples: An Application to Tumour Segmentation for Breast Cancer
One-step abductive multi-target learning (OSAMTL) is an approach proposed to
handle complex noisy labels. However, OSAMTL is not suitable for the situation
where diverse noisy samples (DNS) are provided for a learning task. In this
paper, giving definition of DNS, we propose one-step abductive multi-target
learning with DNS (OSAMTL-DNS) to expand the original OSAMTL to a wider range
of tasks that handle complex noisy labels. Applying OSAMTL-DNS to tumour
segmentation for breast cancer in medical histopathology whole slide image
analysis, we show that OSAMTL-DNS is able to enable various state-of-the-art
approaches for learning from noisy labels to achieve significantly more
rational predictions.Comment: The proofs provide in the supplementary needs further careful
consideratio
Impact of Noisy Labels on Dental Deep Learning—Calculus Detection on Bitewing Radiographs
Supervised deep learning requires labelled data. On medical images, data is often labelled inconsistently (e.g., too large) with varying accuracies. We aimed to assess the impact of such label noise on dental calculus detection on bitewing radiographs. On 2584 bitewings calculus was accurately labeled using bounding boxes (BBs) and artificially increased and decreased stepwise, resulting in 30 consistently and 9 inconsistently noisy datasets. An object detection network (YOLOv5) was trained on each dataset and evaluated on noisy and accurate test data. Training on accurately labeled data yielded an mAP50: 0.77 (SD: 0.01). When trained on consistently too small BBs model performance significantly decreased on accurate and noisy test data. Model performance trained on consistently too large BBs decreased immediately on accurate test data (e.g., 200% BBs: mAP50: 0.24; SD: 0.05; p < 0.05), but only after drastically increasing BBs on noisy test data (e.g., 70,000%: mAP50: 0.75; SD: 0.01; p < 0.05). Models trained on inconsistent BB sizes showed a significant decrease of performance when deviating 20% or more from the original when tested on noisy data (mAP50: 0.74; SD: 0.02; p < 0.05), or 30% or more when tested on accurate data (mAP50: 0.76; SD: 0.01; p < 0.05). In conclusion, accurate predictions need accurate labeled data in the training process. Testing on noisy data may disguise the effects of noisy training data. Researchers should be aware of the relevance of accurately annotated data, especially when testing model performances
Weakly Supervised Medical Image Segmentation With Soft Labels and Noise Robust Loss
Recent advances in deep learning algorithms have led to significant benefits
for solving many medical image analysis problems. Training deep learning models
commonly requires large datasets with expert-labeled annotations. However,
acquiring expert-labeled annotation is not only expensive but also is
subjective, error-prone, and inter-/intra- observer variability introduces
noise to labels. This is particularly a problem when using deep learning models
for segmenting medical images due to the ambiguous anatomical boundaries.
Image-based medical diagnosis tools using deep learning models trained with
incorrect segmentation labels can lead to false diagnoses and treatment
suggestions. Multi-rater annotations might be better suited to train deep
learning models with small training sets compared to single-rater annotations.
The aim of this paper was to develop and evaluate a method to generate
probabilistic labels based on multi-rater annotations and anatomical knowledge
of the lesion features in MRI and a method to train segmentation models using
probabilistic labels using normalized active-passive loss as a "noise-tolerant
loss" function. The model was evaluated by comparing it to binary ground truth
for 17 knees MRI scans for clinical segmentation and detection of bone marrow
lesions (BML). The proposed method successfully improved precision 14, recall
22, and Dice score 8 percent compared to a binary cross-entropy loss function.
Overall, the results of this work suggest that the proposed normalized
active-passive loss using soft labels successfully mitigated the effects of
noisy labels
Detecting Label Noise via Leave-One-Out Cross-Validation
We present a simple algorithm for identifying and correcting real-valued
noisy labels from a mixture of clean and corrupted sample points using Gaussian
process regression. A heteroscedastic noise model is employed, in which
additive Gaussian noise terms with independent variances are associated with
each and all of the observed labels. Optimizing the noise model using maximum
likelihood estimation leads to the containment of the GPR model's predictive
error by the posterior standard deviation in leave-one-out cross-validation. A
multiplicative update scheme is proposed for solving the maximum likelihood
estimation problem under non-negative constraints. While we provide proof of
convergence for certain special cases, the multiplicative scheme has
empirically demonstrated monotonic convergence behavior in virtually all our
numerical experiments. We show that the presented method can pinpoint corrupted
sample points and lead to better regression models when trained on synthetic
and real-world scientific data sets
Analyze the Robustness of Classifiers under Label Noise
This study explores the robustness of label noise classifiers, aiming to
enhance model resilience against noisy data in complex real-world scenarios.
Label noise in supervised learning, characterized by erroneous or imprecise
labels, significantly impairs model performance. This research focuses on the
increasingly pertinent issue of label noise's impact on practical applications.
Addressing the prevalent challenge of inaccurate training data labels, we
integrate adversarial machine learning (AML) and importance reweighting
techniques. Our approach involves employing convolutional neural networks (CNN)
as the foundational model, with an emphasis on parameter adjustment for
individual training samples. This strategy is designed to heighten the model's
focus on samples critically influencing performance.Comment: 21 pages, 11 figure
Curriculum Guided Domain Adaptation in the Dark
Addressing the rising concerns of privacy and security, domain adaptation in
the dark aims to adapt a black-box source trained model to an unlabeled target
domain without access to any source data or source model parameters. The need
for domain adaptation of black-box predictors becomes even more pronounced to
protect intellectual property as deep learning based solutions are becoming
increasingly commercialized. Current methods distill noisy predictions on the
target data obtained from the source model to the target model, and/or separate
clean/noisy target samples before adapting using traditional noisy label
learning algorithms. However, these methods do not utilize the easy-to-hard
learning nature of the clean/noisy data splits. Also, none of the existing
methods are end-to-end, and require a separate fine-tuning stage and an initial
warmup stage. In this work, we present Curriculum Adaptation for Black-Box
(CABB) which provides a curriculum guided adaptation approach to gradually
train the target model, first on target data with high confidence (clean)
labels, and later on target data with noisy labels. CABB utilizes
Jensen-Shannon divergence as a better criterion for clean-noisy sample
separation, compared to the traditional criterion of cross entropy loss. Our
method utilizes co-training of a dual-branch network to suppress error
accumulation resulting from confirmation bias. The proposed approach is
end-to-end trainable and does not require any extra finetuning stage, unlike
existing methods. Empirical results on standard domain adaptation datasets show
that CABB outperforms existing state-of-the-art black-box DA models and is
comparable to white-box domain adaptation models