1,103 research outputs found
Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
For many applications the collection of labeled data is expensive laborious.
Exploitation of unlabeled data during training is thus a long pursued objective
of machine learning. Self-supervised learning addresses this by positing an
auxiliary task (different, but related to the supervised task) for which data
is abundantly available. In this paper, we show how ranking can be used as a
proxy task for some regression problems. As another contribution, we propose an
efficient backpropagation technique for Siamese networks which prevents the
redundant computation introduced by the multi-branch network architecture. We
apply our framework to two regression problems: Image Quality Assessment (IQA)
and Crowd Counting. For both we show how to automatically generate ranked image
sets from unlabeled data. Our results show that networks trained to regress to
the ground truth targets for labeled data and to simultaneously learn to rank
unlabeled data obtain significantly better, state-of-the-art results for both
IQA and crowd counting. In addition, we show that measuring network uncertainty
on the self-supervised proxy task is a good measure of informativeness of
unlabeled data. This can be used to drive an algorithm for active learning and
we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality
assessment, crowd counting, active learning). arXiv admin note: text overlap
with arXiv:1803.0309
Leveraging Unlabeled Data for Crowd Counting by Learning to Rank
We propose a novel crowd counting approach that leverages abundantly
available unlabeled crowd imagery in a learning-to-rank framework. To induce a
ranking of cropped images , we use the observation that any sub-image of a
crowded scene image is guaranteed to contain the same number or fewer persons
than the super-image. This allows us to address the problem of limited size of
existing datasets for crowd counting. We collect two crowd scene datasets from
Google using keyword searches and query-by-example image retrieval,
respectively. We demonstrate how to efficiently learn from these unlabeled
datasets by incorporating learning-to-rank in a multi-task network which
simultaneously ranks images and estimates crowd density maps. Experiments on
two of the most challenging crowd counting datasets show that our approach
obtains state-of-the-art results.Comment: Accepted by CVPR1
Semi-supervised Skin Lesion Segmentation via Transformation Consistent Self-ensembling Model
Automatic skin lesion segmentation on dermoscopic images is an essential
component in computer-aided diagnosis of melanoma. Recently, many fully
supervised deep learning based methods have been proposed for automatic skin
lesion segmentation. However, these approaches require massive pixel-wise
annotation from experienced dermatologists, which is very costly and
time-consuming. In this paper, we present a novel semi-supervised method for
skin lesion segmentation by leveraging both labeled and unlabeled data. The
network is optimized by the weighted combination of a common supervised loss
for labeled inputs only and a regularization loss for both labeled and
unlabeled data. In this paper, we present a novel semi-supervised method for
skin lesion segmentation, where the network is optimized by the weighted
combination of a common supervised loss for labeled inputs only and a
regularization loss for both labeled and unlabeled data. Our method encourages
a consistent prediction for unlabeled images using the outputs of the
network-in-training under different regularizations, so that it can utilize the
unlabeled data. To utilize the unlabeled data, our method encourages the
consistent predictions of the network-in-training for the same input under
different regularizations. Aiming for the semi-supervised segmentation problem,
we enhance the effect of regularization for pixel-level predictions by
introducing a transformation, including rotation and flipping, consistent
scheme in our self-ensembling model. With only 300 labeled training samples,
our method sets a new record on the benchmark of the International Skin Imaging
Collaboration (ISIC) 2017 skin lesion segmentation challenge. Such a result
clearly surpasses fully-supervised state-of-the-arts that are trained with 2000
labeled data.Comment: BMVC 201
Cross Domain Knowledge Learning with Dual-branch Adversarial Network for Vehicle Re-identification
The widespread popularization of vehicles has facilitated all people's life
during the last decades. However, the emergence of a large number of vehicles
poses the critical but challenging problem of vehicle re-identification (reID).
Till now, for most vehicle reID algorithms, both the training and testing
processes are conducted on the same annotated datasets under supervision.
However, even a well-trained model will still cause fateful performance drop
due to the severe domain bias between the trained dataset and the real-world
scenes.
To address this problem, this paper proposes a domain adaptation framework
for vehicle reID (DAVR), which narrows the cross-domain bias by fully
exploiting the labeled data from the source domain to adapt the target domain.
DAVR develops an image-to-image translation network named Dual-branch
Adversarial Network (DAN), which could promote the images from the source
domain (well-labeled) to learn the style of target domain (unlabeled) without
any annotation and preserve identity information from source domain. Then the
generated images are employed to train the vehicle reID model by a proposed
attention-based feature learning model with more reasonable styles. Through the
proposed framework, the well-trained reID model has better domain adaptation
ability for various scenes in real-world situations. Comprehensive experimental
results have demonstrated that our proposed DAVR can achieve excellent
performances on both VehicleID dataset and VeRi-776 dataset.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0786
Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video
We tackle the problem of learning concept classifiers from videos on the web
without using manually labeled data. Although metadata attached to videos
(e.g., video titles, descriptions) can be of help collecting training data for
the target concept, the collected data is often very noisy. The main challenge
is therefore how to select good examples from noisy training data. Previous
approaches firstly learn easy examples that are unlikely to be noise and then
gradually learn more complex examples. However, hard examples that are much
different from easy ones are never learned. In this paper, we propose an
approach called multimodal co-training (MMCo) for selecting good examples from
noisy training data. MMCo jointly learns classifiers for multiple modalities
that complement each other to select good examples. Since MMCo selects examples
by consensus of multimodal classifiers, a hard example for one modality can
still be used as a training example by exploiting the power of the other
modalities. The algorithm is very simple and easily implemented but yields
consistent and significant boosts in example selection and classification
performance on the FCVID and YouTube8M benchmarks
Active Learning using Deep Bayesian Networks for Surgical Workflow Analysis
For many applications in the field of computer assisted surgery, such as
providing the position of a tumor, specifying the most probable tool required
next by the surgeon or determining the remaining duration of surgery, methods
for surgical workflow analysis are a prerequisite. Often machine learning based
approaches serve as basis for surgical workflow analysis. In general machine
learning algorithms, such as convolutional neural networks (CNN), require large
amounts of labeled data. While data is often available in abundance, many tasks
in surgical workflow analysis need data annotated by domain experts, making it
difficult to obtain a sufficient amount of annotations.
The aim of using active learning to train a machine learning model is to
reduce the annotation effort. Active learning methods determine which unlabeled
data points would provide the most information according to some metric, such
as prediction uncertainty. Experts will then be asked to only annotate these
data points. The model is then retrained with the new data and used to select
further data for annotation. Recently, active learning has been applied to CNN
by means of Deep Bayesian Networks (DBN). These networks make it possible to
assign uncertainties to predictions.
In this paper, we present a DBN-based active learning approach adapted for
image-based surgical workflow analysis task. Furthermore, by using a recurrent
architecture, we extend this network to video-based surgical workflow analysis.
We evaluate these approaches on the Cholec80 dataset by performing instrument
presence detection and surgical phase segmentation. Here we are able to show
that using a DBN-based active learning approach for selecting what data points
to annotate next outperforms a baseline based on randomly selecting data
points
Adversarial Attacks on Graph Neural Networks via Meta Learning
Deep learning models for graphs have advanced the state of the art on many
tasks. Despite their recent success, little is known about their robustness. We
investigate training time attacks on graph neural networks for node
classification that perturb the discrete graph structure. Our core principle is
to use meta-gradients to solve the bilevel problem underlying training-time
attacks, essentially treating the graph as a hyperparameter to optimize. Our
experiments show that small graph perturbations consistently lead to a strong
decrease in performance for graph convolutional networks, and even transfer to
unsupervised embeddings. Remarkably, the perturbations created by our algorithm
can misguide the graph neural networks such that they perform worse than a
simple baseline that ignores all relational information. Our attacks do not
assume any knowledge about or access to the target classifiers.Comment: ICLR submissio
Injecting Relational Structural Representation in Neural Networks for Question Similarity
Effectively using full syntactic parsing information in Neural Networks (NNs)
to solve relational tasks, e.g., question similarity, is still an open problem.
In this paper, we propose to inject structural representations in NNs by (i)
learning an SVM model using Tree Kernels (TKs) on relatively few pairs of
questions (few thousands) as gold standard (GS) training data is typically
scarce, (ii) predicting labels on a very large corpus of question pairs, and
(iii) pre-training NNs on such large corpus. The results on Quora and SemEval
question similarity datasets show that NNs trained with our approach can learn
more accurate models, especially after fine tuning on GS.Comment: ACL201
Weakly Supervised Vessel Segmentation in X-ray Angiograms by Self-Paced Learning from Noisy Labels with Suggestive Annotation
The segmentation of coronary arteries in X-ray angiograms by convolutional
neural networks (CNNs) is promising yet limited by the requirement of precisely
annotating all pixels in a large number of training images, which is extremely
labor-intensive especially for complex coronary trees. To alleviate the burden
on the annotator, we propose a novel weakly supervised training framework that
learns from noisy pseudo labels generated from automatic vessel enhancement,
rather than accurate labels obtained by fully manual annotation. A typical
self-paced learning scheme is used to make the training process robust against
label noise while challenged by the systematic biases in pseudo labels, thus
leading to the decreased performance of CNNs at test time. To solve this
problem, we propose an annotation-refining self-paced learning framework
(AR-SPL) to correct the potential errors using suggestive annotation. An
elaborate model-vesselness uncertainty estimation is also proposed to enable
the minimal annotation cost for suggestive annotation, based on not only the
CNNs in training but also the geometric features of coronary arteries derived
directly from raw data. Experiments show that our proposed framework achieves
1) comparable accuracy to fully supervised learning, which also significantly
outperforms other weakly supervised learning frameworks; 2) largely reduced
annotation cost, i.e., 75.18% of annotation time is saved, and only 3.46% of
image regions are required to be annotated; and 3) an efficient intervention
process, leading to superior performance with even fewer manual interactions
Transformation Consistent Self-ensembling Model for Semi-supervised Medical Image Segmentation
Deep convolutional neural networks have achieved remarkable progress on a
variety of medical image computing tasks. A common problem when applying
supervised deep learning methods to medical images is the lack of labeled data,
which is very expensive and time-consuming to be collected. In this paper, we
present a novel semi-supervised method for medical image segmentation, where
the network is optimized by the weighted combination of a common supervised
loss for labeled inputs only and a regularization loss for both labeled and
unlabeled data. To utilize the unlabeled data, our method encourages the
consistent predictions of the network-in-training for the same input under
different regularizations. Aiming for the semi-supervised segmentation problem,
we enhance the effect of regularization for pixel-level predictions by
introducing a transformation, including rotation and flipping, consistent
scheme in our self-ensembling model. With the aim of semi-supervised
segmentation tasks, we introduce a transformation consistent strategy in our
self-ensembling model to enhance the regularization effect for pixel-level
predictions. We have extensively validated the proposed semi-supervised method
on three typical yet challenging medical image segmentation tasks: (i) skin
lesion segmentation from dermoscopy images on International Skin Imaging
Collaboration (ISIC) 2017 dataset, (ii) optic disc segmentation from fundus
images on Retinal Fundus Glaucoma Challenge (REFUGE) dataset, and (iii) liver
segmentation from volumetric CT scans on Liver Tumor Segmentation Challenge
(LiTS) dataset. Compared to the state-of-the-arts, our proposed method shows
superior segmentation performance on challenging 2D/3D medical images,
demonstrating the effectiveness of our semi-supervised method for medical image
segmentation.Comment: Accept at IEEE Transactions on Neural Networks and Learning System
- …