40,058 research outputs found
Fully Convolutional Multi-Class Multiple Instance Learning
Multiple instance learning (MIL) can reduce the need for costly annotation in
tasks such as semantic segmentation by weakening the required degree of
supervision. We propose a novel MIL formulation of multi-class semantic
segmentation learning by a fully convolutional network. In this setting, we
seek to learn a semantic segmentation model from just weak image-level labels.
The model is trained end-to-end to jointly optimize the representation while
disambiguating the pixel-image label assignment. Fully convolutional training
accepts inputs of any size, does not need object proposal pre-processing, and
offers a pixelwise loss map for selecting latent instances. Our multi-class MIL
loss exploits the further supervision given by images with multiple labels. We
evaluate this approach through preliminary experiments on the PASCAL VOC
segmentation challenge.Comment: in ICLR 201
A Multi-scale Multiple Instance Video Description Network
Generating natural language descriptions for in-the-wild videos is a
challenging task. Most state-of-the-art methods for solving this problem borrow
existing deep convolutional neural network (CNN) architectures (AlexNet,
GoogLeNet) to extract a visual representation of the input video. However,
these deep CNN architectures are designed for single-label centered-positioned
object classification. While they generate strong semantic features, they have
no inherent structure allowing them to detect multiple objects of different
sizes and locations in the frame. Our paper tries to solve this problem by
integrating the base CNN into several fully convolutional neural networks
(FCNs) to form a multi-scale network that handles multiple receptive field
sizes in the original image. FCNs, previously applied to image segmentation,
can generate class heat-maps efficiently compared to sliding window mechanisms,
and can easily handle multiple scales. To further handle the ambiguity over
multiple objects and locations, we incorporate the Multiple Instance Learning
mechanism (MIL) to consider objects in different positions and at different
scales simultaneously. We integrate our multi-scale multi-instance architecture
with a sequence-to-sequence recurrent neural network to generate sentence
descriptions based on the visual representation. Ours is the first end-to-end
trainable architecture that is capable of multi-scale region processing.
Evaluation on a Youtube video dataset shows the advantage of our approach
compared to the original single-scale whole frame CNN model. Our flexible and
efficient architecture can potentially be extended to support other video
processing tasks.Comment: ICCV15 workshop on Closing the Loop Between Vision and Languag
Classifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning
Convolutional neural networks (CNN) have achieved state of the art
performance on both classification and segmentation tasks. Applying CNNs to
microscopy images is challenging due to the lack of datasets labeled at the
single cell level. We extend the application of CNNs to microscopy image
classification and segmentation using multiple instance learning (MIL). We
present the adaptive Noisy-AND MIL pooling function, a new MIL operator that is
robust to outliers. Combining CNNs with MIL enables training CNNs using full
resolution microscopy images with global labels. We base our approach on the
similarity between the aggregation function used in MIL and pooling layers used
in CNNs. We show that training MIL CNNs end-to-end outperforms several previous
methods on both mammalian and yeast microscopy images without requiring any
segmentation steps
DASNet: Reducing Pixel-level Annotations for Instance and Semantic Segmentation
Pixel-level annotation demands expensive human efforts and limits the
performance of deep networks that usually benefits from more such training
data. In this work we aim to achieve high quality instance and semantic
segmentation results over a small set of pixel-level mask annotations and a
large set of box annotations. The basic idea is exploring detection models to
simplify the pixel-level supervised learning task and thus reduce the required
amount of mask annotations. Our architecture, named DASNet, consists of three
modules: detection, attention, and segmentation. The detection module detects
all classes of objects, the attention module generates multi-scale
class-specific features, and the segmentation module recovers the binary masks.
Our method demonstrates substantially improved performance compared to existing
semi-supervised approaches on PASCAL VOC 2012 dataset
Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey
From the autonomous car driving to medical diagnosis, the requirement of the
task of image segmentation is everywhere. Segmentation of an image is one of
the indispensable tasks in computer vision. This task is comparatively
complicated than other vision tasks as it needs low-level spatial information.
Basically, image segmentation can be of two types: semantic segmentation and
instance segmentation. The combined version of these two basic tasks is known
as panoptic segmentation. In the recent era, the success of deep convolutional
neural networks (CNN) has influenced the field of segmentation greatly and gave
us various successful models to date. In this survey, we are going to take a
glance at the evolution of both semantic and instance segmentation work based
on CNN. We have also specified comparative architectural details of some
state-of-the-art models and discuss their training details to present a lucid
understanding of hyper-parameter tuning of those models. We have also drawn a
comparison among the performance of those models on different datasets. Lastly,
we have given a glimpse of some state-of-the-art panoptic segmentation models.Comment: 38 pages, 29 figures, 8 table
Deep Patch Learning for Weakly Supervised Object Classification and Discovery
Patch-level image representation is very important for object classification
and detection, since it is robust to spatial transformation, scale variation,
and cluttered background. Many existing methods usually require fine-grained
supervisions (e.g., bounding-box annotations) to learn patch features, which
requires a great effort to label images may limit their potential applications.
In this paper, we propose to learn patch features via weak supervisions, i.e.,
only image-level supervisions. To achieve this goal, we treat images as bags
and patches as instances to integrate the weakly supervised multiple instance
learning constraints into deep neural networks. Also, our method integrates the
traditional multiple stages of weakly supervised object classification and
discovery into a unified deep convolutional neural network and optimizes the
network in an end-to-end way. The network processes the two tasks object
classification and discovery jointly, and shares hierarchical deep features.
Through this jointly learning strategy, weakly supervised object classification
and discovery are beneficial to each other. We test the proposed method on the
challenging PASCAL VOC datasets. The results show that our method can obtain
state-of-the-art performance on object classification, and very competitive
results on object discovery, with faster testing speed than competitors.Comment: Accepted by Pattern Recognitio
Self-Transfer Learning for Fully Weakly Supervised Object Localization
Recent advances of deep learning have achieved remarkable performances in
various challenging computer vision tasks. Especially in object localization,
deep convolutional neural networks outperform traditional approaches based on
extraction of data/task-driven features instead of hand-crafted features.
Although location information of region-of-interests (ROIs) gives good prior
for object localization, it requires heavy annotation efforts from human
resources. Thus a weakly supervised framework for object localization is
introduced. The term "weakly" means that this framework only uses image-level
labeled datasets to train a network. With the help of transfer learning which
adopts weight parameters of a pre-trained network, the weakly supervised
learning framework for object localization performs well because the
pre-trained network already has well-trained class-specific features. However,
those approaches cannot be used for some applications which do not have
pre-trained networks or well-localized large scale images. Medical image
analysis is a representative among those applications because it is impossible
to obtain such pre-trained networks. In this work, we present a "fully" weakly
supervised framework for object localization ("semi"-weakly is the counterpart
which uses pre-trained filters for weakly supervised localization) named as
self-transfer learning (STL). It jointly optimizes both classification and
localization networks simultaneously. By controlling a supervision level of the
localization network, STL helps the localization network focus on correct ROIs
without any types of priors. We evaluate the proposed STL framework using two
medical image datasets, chest X-rays and mammograms, and achieve signiticantly
better localization performance compared to previous weakly supervised
approaches.Comment: 9 pages, 4 figure
Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification
In this work, we first tackle the problem of simultaneous pixel-level
localization and image-level classification with only image-level labels for
fully convolutional network training. We investigate the global pooling method
which plays a vital role in this task. Classical global max pooling and average
pooling methods are hard to indicate the precise regions of objects. Therefore,
we revisit the global weighted average pooling (GWAP) method for this task and
propose the class-agnostic GWAP module and the class-specific GWAP module in
this paper. We evaluate the classification and pixel-level localization ability
on the ILSVRC benchmark dataset. Experimental results show that the proposed
GWAP module can better capture the regions of the foreground objects. We
further explore the knowledge transfer between the image classification task
and the region-based object detection task. We propose a multi-task framework
that combines our class-specific GWAP module with R-FCN. The framework is
trained with few ground truth bounding boxes and large-scale image-level
labels. We evaluate this framework on PASCAL VOC dataset. Experimental results
show that this framework can use the data with only image-level labels to
improve the generalization of the object detection model.Comment: technical repor
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
Weakly Supervised Medical Diagnosis and Localization from Multiple Resolutions
Diagnostic imaging often requires the simultaneous identification of a
multitude of findings of varied size and appearance. Beyond global indication
of said findings, the prediction and display of localization information
improves trust in and understanding of results when augmenting clinical
workflow. Medical training data rarely includes more than global image-level
labels as segmentations are time-consuming and expensive to collect. We
introduce an approach to managing these practical constraints by applying a
novel architecture which learns at multiple resolutions while generating
saliency maps with weak supervision. Further, we parameterize the Log-Sum-Exp
pooling function with a learnable lower-bounded adaptation (LSE-LBA) to build
in a sharpness prior and better handle localizing abnormalities of different
sizes using only image-level labels. Applying this approach to interpreting
chest x-rays, we set the state of the art on 9 abnormalities in the NIH's CXR14
dataset while generating saliency maps with the highest resolution to date.Comment: submitted to ECCV 201
- …