3,573 research outputs found
Learning to Segment Every Thing
Most methods for object instance segmentation require all training examples
to be labeled with segmentation masks. This requirement makes it expensive to
annotate new categories and has restricted instance segmentation models to ~100
well-annotated classes. The goal of this paper is to propose a new partially
supervised training paradigm, together with a novel weight transfer function,
that enables training instance segmentation models on a large set of categories
all of which have box annotations, but only a small fraction of which have mask
annotations. These contributions allow us to train Mask R-CNN to detect and
segment 3000 visual concepts using box annotations from the Visual Genome
dataset and mask annotations from the 80 classes in the COCO dataset. We
evaluate our approach in a controlled study on the COCO dataset. This work is a
first step towards instance segmentation models that have broad comprehension
of the visual world
Self-Transfer Learning for Fully Weakly Supervised Object Localization
Recent advances of deep learning have achieved remarkable performances in
various challenging computer vision tasks. Especially in object localization,
deep convolutional neural networks outperform traditional approaches based on
extraction of data/task-driven features instead of hand-crafted features.
Although location information of region-of-interests (ROIs) gives good prior
for object localization, it requires heavy annotation efforts from human
resources. Thus a weakly supervised framework for object localization is
introduced. The term "weakly" means that this framework only uses image-level
labeled datasets to train a network. With the help of transfer learning which
adopts weight parameters of a pre-trained network, the weakly supervised
learning framework for object localization performs well because the
pre-trained network already has well-trained class-specific features. However,
those approaches cannot be used for some applications which do not have
pre-trained networks or well-localized large scale images. Medical image
analysis is a representative among those applications because it is impossible
to obtain such pre-trained networks. In this work, we present a "fully" weakly
supervised framework for object localization ("semi"-weakly is the counterpart
which uses pre-trained filters for weakly supervised localization) named as
self-transfer learning (STL). It jointly optimizes both classification and
localization networks simultaneously. By controlling a supervision level of the
localization network, STL helps the localization network focus on correct ROIs
without any types of priors. We evaluate the proposed STL framework using two
medical image datasets, chest X-rays and mammograms, and achieve signiticantly
better localization performance compared to previous weakly supervised
approaches.Comment: 9 pages, 4 figure
Deconvolutional Feature Stacking for Weakly-Supervised Semantic Segmentation
A weakly-supervised semantic segmentation framework with a tied
deconvolutional neural network is presented. Each deconvolution layer in the
framework consists of unpooling and deconvolution operations. 'Unpooling'
upsamples the input feature map based on unpooling switches defined by
corresponding convolution layer's pooling operation. 'Deconvolution' convolves
the input unpooled features by using convolutional weights tied with the
corresponding convolution layer's convolution operation. The
unpooling-deconvolution combination helps to eliminate less discriminative
features in a feature extraction stage, since output features of the
deconvolution layer are reconstructed from the most discriminative unpooled
features instead of the raw one. This results in reduction of false positives
in a pixel-level inference stage. All the feature maps restored from the entire
deconvolution layers can constitute a rich discriminative feature set according
to different abstraction levels. Those features are stacked to be selectively
used for generating class-specific activation maps. Under the weak supervision
(image-level labels), the proposed framework shows promising results on lesion
segmentation in medical images (chest X-rays) and achieves state-of-the-art
performance on the PASCAL VOC segmentation dataset in the same experimental
condition
Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes
Semantic segmentation, a pixel-level vision task, is developed rapidly by
using convolutional neural networks (CNNs). Training CNNs requires a large
amount of labeled data, but manually annotating data is difficult. For
emancipating manpower, in recent years, some synthetic datasets are released.
However, they are still different from real scenes, which causes that training
a model on the synthetic data (source domain) cannot achieve a good performance
on real urban scenes (target domain). In this paper, we propose a weakly
supervised adversarial domain adaptation to improve the segmentation
performance from synthetic data to real scenes, which consists of three deep
neural networks. To be specific, a detection and segmentation ("DS" for short)
model focuses on detecting objects and predicting segmentation map; a
pixel-level domain classifier ("PDC" for short) tries to distinguish the image
features from which domains; an object-level domain classifier ("ODC" for
short) discriminates the objects from which domains and predicts the objects
classes. PDC and ODC are treated as the discriminators, and DS is considered as
the generator. By adversarial learning, DS is supposed to learn
domain-invariant features. In experiments, our proposed method yields the new
record of mIoU metric in the same problem.Comment: To appear at TI
Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis
Machine learning (ML) algorithms have made a tremendous impact in the field
of medical imaging. While medical imaging datasets have been growing in size, a
challenge for supervised ML algorithms that is frequently mentioned is the lack
of annotated data. As a result, various methods which can learn with less/other
types of supervision, have been proposed. We review semi-supervised, multiple
instance, and transfer learning in medical imaging, both in diagnosis/detection
or segmentation tasks. We also discuss connections between these learning
scenarios, and opportunities for future research.Comment: Submitted to Medical Image Analysi
STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation
Recently, significant improvement has been made on semantic object
segmentation due to the development of deep convolutional neural networks
(DCNNs). Training such a DCNN usually relies on a large number of images with
pixel-level segmentation masks, and annotating these images is very costly in
terms of both finance and human effort. In this paper, we propose a simple to
complex (STC) framework in which only image-level annotations are utilized to
learn DCNNs for semantic segmentation. Specifically, we first train an initial
segmentation network called Initial-DCNN with the saliency maps of simple
images (i.e., those with a single category of major object(s) and clean
background). These saliency maps can be automatically obtained by existing
bottom-up salient object detection techniques, where no supervision information
is needed. Then, a better network called Enhanced-DCNN is learned with
supervision from the predicted segmentation masks of simple images based on the
Initial-DCNN as well as the image-level annotations. Finally, more pixel-level
segmentation masks of complex images (two or more categories of objects with
cluttered background), which are inferred by using Enhanced-DCNN and
image-level annotations, are utilized as the supervision information to learn
the Powerful-DCNN for semantic segmentation. Our method utilizes K simple
images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely
boosting the segmentation network. Extensive experimental results on PASCAL VOC
2012 segmentation benchmark well demonstrate the superiority of the proposed
STC framework compared with other state-of-the-arts.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Hand Pose Estimation through Semi-Supervised and Weakly-Supervised Learning
We propose a method for hand pose estimation based on a deep regressor
trained on two different kinds of input. Raw depth data is fused with an
intermediate representation in the form of a segmentation of the hand into
parts. This intermediate representation contains important topological
information and provides useful cues for reasoning about joint locations. The
mapping from raw depth to segmentation maps is learned in a
semi/weakly-supervised way from two different datasets: (i) a synthetic dataset
created through a rendering pipeline including densely labeled ground truth
(pixelwise segmentations); and (ii) a dataset with real images for which ground
truth joint positions are available, but not dense segmentations. Loss for
training on real images is generated from a patch-wise restoration process,
which aligns tentative segmentation maps with a large dictionary of synthetic
poses. The underlying premise is that the domain shift between synthetic and
real data is smaller in the intermediate representation, where labels carry
geometric and topological meaning, than in the raw input domain. Experiments on
the NYU dataset show that the proposed training method decreases error on
joints over direct regression of joints from depth data by 15.7%.Comment: 13 pages, 10 figures, 4 table
Review on Computer Vision in Gastric Cancer: Potential Efficient Tools for Diagnosis
Rapid diagnosis of gastric cancer is a great challenge for clinical doctors.
Dramatic progress of computer vision on gastric cancer has been made recently
and this review focuses on advances during the past five years. Different
methods for data generation and augmentation are presented, and various
approaches to extract discriminative features compared and evaluated.
Classification and segmentation techniques are carefully discussed for
assisting more precise diagnosis and timely treatment. For classification,
various methods have been developed to better proceed specific images, such as
images with rotation and estimated real-timely (endoscopy), high resolution
images (histopathology), low diagnostic accuracy images (X-ray), poor contrast
images of the soft-tissue with cavity (CT) or those images with insufficient
annotation. For detection and segmentation, traditional methods and machine
learning methods are compared. Application of those methods will greatly reduce
the labor and time consumption for the diagnosis of gastric cancers
Exploiting Web Images for Weakly Supervised Object Detection
In recent years, the performance of object detection has advanced
significantly with the evolving deep convolutional neural networks. However,
the state-of-the-art object detection methods still rely on accurate bounding
box annotations that require extensive human labelling. Object detection
without bounding box annotations, i.e, weakly supervised detection methods, are
still lagging far behind. As weakly supervised detection only uses image level
labels and does not require the ground truth of bounding box location and label
of each object in an image, it is generally very difficult to distill knowledge
of the actual appearances of objects. Inspired by curriculum learning, this
paper proposes an easy-to-hard knowledge transfer scheme that incorporates easy
web images to provide prior knowledge of object appearance as a good starting
point. While exploiting large-scale free web imagery, we introduce a
sophisticated labour free method to construct a web dataset with good diversity
in object appearance. After that, semantic relevance and distribution relevance
are introduced and utilized in the proposed curriculum training scheme. Our
end-to-end learning with the constructed web data achieves remarkable
improvement across most object classes especially for the classes that are
often considered hard in other works
Learning to Detect Blue-white Structures in Dermoscopy Images with Weak Supervision
We propose a novel approach to identify one of the most significant
dermoscopic criteria in the diagnosis of Cutaneous Melanoma: the Blue-whitish
structure. In this paper, we achieve this goal in a Multiple Instance Learning
framework using only image-level labels of whether the feature is present or
not. As the output, we predict the image classification label and as well
localize the feature in the image. Experiments are conducted on a challenging
dataset with results outperforming state-of-the-art. This study provides an
improvement on the scope of modelling for computerized image analysis of skin
lesions, in particular in that it puts forward a framework for identification
of dermoscopic local features from weakly-labelled data
- …