3,560 research outputs found
Training Deep Neural Networks on Noisy Labels with Bootstrapping
Current state-of-the-art deep learning systems for visual object recognition
and detection use purely supervised training with regularization such as
dropout to avoid overfitting. The performance depends critically on the amount
of labeled examples, and in current practice the labels are assumed to be
unambiguous and accurate. However, this assumption often does not hold; e.g. in
recognition, class labels may be missing; in detection, objects in the image
may not be localized; and in general, the labeling may be subjective. In this
work we propose a generic way to handle noisy and incomplete labeling by
augmenting the prediction objective with a notion of consistency. We consider a
prediction consistent if the same prediction is made given similar percepts,
where the notion of similarity is between deep network features computed from
the input data. In experiments we demonstrate that our approach yields
substantial robustness to label noise on several datasets. On MNIST handwritten
digits, we show that our model is robust to label corruption. On the Toronto
Face Database, we show that our model handles well the case of subjective
labels in emotion recognition, achieving state-of-the- art results, and can
also benefit from unlabeled face images with no modification to our method. On
the ILSVRC2014 detection challenge data, we show that our approach extends to
very deep networks, high resolution images and structured outputs, and results
in improved scalable detection
Bootstrapping Deep Neural Networks from Approximate Image Processing Pipelines
Complex image processing and computer vision systems often consist of a
processing pipeline of functional modules. We intend to replace parts or all of
a target pipeline with deep neural networks to achieve benefits such as
increased accuracy or reduced computational requirement. To acquire a large
amount of labeled data necessary to train the deep neural network, we propose a
workflow that leverages the target pipeline to create a significantly larger
labeled training set automatically, without prior domain knowledge of the
target pipeline. We show experimentally that despite the noise introduced by
automated labeling and only using a very small initially labeled data set, the
trained deep neural networks can achieve similar or even better performance
than the components they replace, while in some cases also reducing
computational requirements.Comment: 6 pages, 5 figure
Unsupervised Label Noise Modeling and Loss Correction
Despite being robust to small amounts of label noise, convolutional neural
networks trained with stochastic gradient methods have been shown to easily fit
random labels. When there are a mixture of correct and mislabelled targets,
networks tend to fit the former before the latter. This suggests using a
suitable two-component mixture model as an unsupervised generative model of
sample loss values during training to allow online estimation of the
probability that a sample is mislabelled. Specifically, we propose a beta
mixture to estimate this probability and correct the loss by relying on the
network prediction (the so-called bootstrapping loss). We further adapt mixup
augmentation to drive our approach a step further. Experiments on CIFAR-10/100
and TinyImageNet demonstrate a robustness to label noise that substantially
outperforms recent state-of-the-art. Source code is available at
https://git.io/fjsvEComment: Accepted to ICML 201
A Light CNN for Deep Face Representation with Noisy Labels
The volume of convolutional neural network (CNN) models proposed for face
recognition has been continuously growing larger to better fit large amount of
training data. When training data are obtained from internet, the labels are
likely to be ambiguous and inaccurate. This paper presents a Light CNN
framework to learn a compact embedding on the large-scale face data with
massive noisy labels. First, we introduce a variation of maxout activation,
called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different
from maxout activation that uses many feature maps to linearly approximate an
arbitrary convex activation function, MFM does so via a competitive
relationship. MFM can not only separate noisy and informative signals but also
play the role of feature selection between two feature maps. Second, three
networks are carefully designed to obtain better performance meanwhile reducing
the number of parameters and computational costs. Lastly, a semantic
bootstrapping method is proposed to make the prediction of the networks more
consistent with noisy labels. Experimental results show that the proposed
framework can utilize large-scale noisy data to learn a Light model that is
efficient in computational costs and storage spaces. The learned single network
with a 256-D representation achieves state-of-the-art results on various face
benchmarks without fine-tuning. The code is released on
https://github.com/AlfredXiangWu/LightCNN.Comment: arXiv admin note: text overlap with arXiv:1507.04844. The models are
released on https://github.com/AlfredXiangWu/LightCNN, IEEE Transactions on
Information Forensics and Security, 201
Learning Deep Networks from Noisy Labels with Dropout Regularization
Large datasets often have unreliable labels-such as those obtained from
Amazon's Mechanical Turk or social media platforms-and classifiers trained on
mislabeled datasets often exhibit poor performance. We present a simple,
effective technique for accounting for label noise when training deep neural
networks. We augment a standard deep network with a softmax layer that models
the label noise statistics. Then, we train the deep network and noise model
jointly via end-to-end stochastic gradient descent on the (perhaps mislabeled)
dataset. The augmented model is overdetermined, so in order to encourage the
learning of a non-trivial noise model, we apply dropout regularization to the
weights of the noise model during training. Numerical experiments on noisy
versions of the CIFAR-10 and MNIST datasets show that the proposed dropout
technique outperforms state-of-the-art methods.Comment: Published at 2016 IEEE 16th International Conference on Data Minin
Limited Gradient Descent: Learning With Noisy Labels
Label noise may affect the generalization of classifiers, and the effective
learning of main patterns from samples with noisy labels is an important
challenge. Recent studies have shown that deep neural networks tend to
prioritize the learning of simple patterns over the memorization of noise
patterns. This suggests a possible method to search for the best generalization
that learns the main pattern until the noise begins to be memorized.
Traditional approaches often employ a clean validation set to find the best
stop timing of learning, i.e., early stopping. However, the generalization
performance of such methods relies on the quality of validation sets. Further,
in practice, a clean validation set is sometimes difficult to obtain. To solve
this problem, we propose a method that can estimate the optimal stopping timing
without a clean validation set, called limited gradient descent. We modified
the labels of a few samples in a noisy dataset to obtain false labels and to
create a reverse pattern. By monitoring the learning progress of the noisy and
reverse samples, we can determine the stop timing of learning. In this paper,
we also theoretically provide some necessary conditions on learning with noisy
labels. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate
that our approach has a comparable generalization performance to methods
relying on a clean validation set. Thus, on the noisy Clothing-1M dataset, our
approach surpasses methods that rely on a clean validation set
Safeguarded Dynamic Label Regression for Generalized Noisy Supervision
Learning with noisy labels, which aims to reduce expensive labors on accurate
annotations, has become imperative in the Big Data era. Previous noise
transition based method has achieved promising results and presented a
theoretical guarantee on performance in the case of class-conditional noise.
However, this type of approaches critically depend on an accurate
pre-estimation of the noise transition, which is usually impractical.
Subsequent improvement adapts the pre-estimation along with the training
progress via a Softmax layer. However, the parameters in the Softmax layer are
highly tweaked for the fragile performance due to the ill-posed stochastic
approximation. To address these issues, we propose a Latent Class-Conditional
Noise model (LCCN) that naturally embeds the noise transition under a Bayesian
framework. By projecting the noise transition into a Dirichlet-distributed
space, the learning is constrained on a simplex based on the whole dataset,
instead of some ad-hoc parametric space. We then deduce a dynamic label
regression method for LCCN to iteratively infer the latent labels, to
stochastically train the classifier and to model the noise. Our approach
safeguards the bounded update of the noise transition, which avoids previous
arbitrarily tuning via a batch of samples. We further generalize LCCN for
open-set noisy labels and the semi-supervised setting. We perform extensive
experiments with the controllable noise data sets, CIFAR-10 and CIFAR-100, and
the agnostic noise data sets, Clothing1M and WebVision17. The experimental
results have demonstrated that the proposed model outperforms several
state-of-the-art methods.Comment: Submitted to Transactions on Image Processin
Derivative Manipulation for General Example Weighting
Real-world large-scale datasets usually contain noisy labels and are
imbalanced. Therefore, we propose derivative manipulation (DM), a novel and
general example weighting approach for training robust deep models under these
adverse conditions.
DM has two main merits. First, loss function and example weighting are common
techniques in the literature. DM reveals their connection (a loss function does
example weighting) and is a replacement of both. Second, despite that a loss
defines an example weighting scheme by its derivative, in the loss design, we
need to consider whether it is differentiable. Instead, DM is more flexible by
directly modifying the derivative so that a loss can be a non-elementary format
too. Technically, DM defines an emphasis density function by a derivative
magnitude function. DM is generic in that diverse weighting schemes can be
derived.
Extensive experiments on both vision and language tasks prove DM's
effectiveness
Decoupling "when to update" from "how to update"
Deep learning requires data. A useful approach to obtain data is to be
creative and mine data from various sources, that were created for different
purposes. Unfortunately, this approach often leads to noisy labels. In this
paper, we propose a meta algorithm for tackling the noisy labels problem. The
key idea is to decouple "when to update" from "how to update". We demonstrate
the effectiveness of our algorithm by mining data for gender classification by
combining the Labeled Faces in the Wild (LFW) face recognition dataset with a
textual genderizing service, which leads to a noisy dataset. While our approach
is very simple to implement, it leads to state-of-the-art results. We analyze
some convergence properties of the proposed algorithm
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
Instance segmentation of unknown objects from images is regarded as relevant
for several robot skills including grasping, tracking and object sorting.
Recent results in computer vision have shown that large hand-labeled datasets
enable high segmentation performance. To overcome the time-consuming process of
manually labeling data for new environments, we present a transfer learning
approach for robots that learn to segment objects by interacting with their
environment in a self-supervised manner. Our robot pushes unknown objects on a
table and uses information from optical flow to create training labels in the
form of object masks. To achieve this, we fine-tune an existing DeepMask
network for instance segmentation on the self-labeled training data acquired by
the robot. We evaluate our trained network (SelfDeepMask) on a set of real
images showing challenging and cluttered scenes with novel objects. Here,
SelfDeepMask outperforms the DeepMask network trained on the COCO dataset by
9.5% in average precision. Furthermore, we combine our approach with recent
approaches for training with noisy labels in order to better cope with induced
label noise.Comment: Extended version and code release of accepted IROS 2019 pape
- …