5,246 research outputs found
Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis
Machine learning (ML) algorithms have made a tremendous impact in the field
of medical imaging. While medical imaging datasets have been growing in size, a
challenge for supervised ML algorithms that is frequently mentioned is the lack
of annotated data. As a result, various methods which can learn with less/other
types of supervision, have been proposed. We review semi-supervised, multiple
instance, and transfer learning in medical imaging, both in diagnosis/detection
or segmentation tasks. We also discuss connections between these learning
scenarios, and opportunities for future research.Comment: Submitted to Medical Image Analysi
A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges
Co-saliency detection is a newly emerging and rapidly growing research area
in computer vision community. As a novel branch of visual saliency, co-saliency
detection refers to the discovery of common and salient foregrounds from two or
more relevant images, and can be widely used in many computer vision tasks. The
existing co-saliency detection algorithms mainly consist of three components:
extracting effective features to represent the image regions, exploring the
informative cues or factors to characterize co-saliency, and designing
effective computational frameworks to formulate co-saliency. Although numerous
methods have been developed, the literature is still lacking a deep review and
evaluation of co-saliency detection techniques. In this paper, we aim at
providing a comprehensive review of the fundamentals, challenges, and
applications of co-saliency detection. Specifically, we provide an overview of
some related computer vision works, review the history of co-saliency
detection, summarize and categorize the major algorithms in this research area,
discuss some open issues in this area, present the potential applications of
co-saliency detection, and finally point out some unsolved challenges and
promising future works. We expect this review to be beneficial to both fresh
and senior researchers in this field, and give insights to researchers in other
related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table
Deep Patch Learning for Weakly Supervised Object Classification and Discovery
Patch-level image representation is very important for object classification
and detection, since it is robust to spatial transformation, scale variation,
and cluttered background. Many existing methods usually require fine-grained
supervisions (e.g., bounding-box annotations) to learn patch features, which
requires a great effort to label images may limit their potential applications.
In this paper, we propose to learn patch features via weak supervisions, i.e.,
only image-level supervisions. To achieve this goal, we treat images as bags
and patches as instances to integrate the weakly supervised multiple instance
learning constraints into deep neural networks. Also, our method integrates the
traditional multiple stages of weakly supervised object classification and
discovery into a unified deep convolutional neural network and optimizes the
network in an end-to-end way. The network processes the two tasks object
classification and discovery jointly, and shares hierarchical deep features.
Through this jointly learning strategy, weakly supervised object classification
and discovery are beneficial to each other. We test the proposed method on the
challenging PASCAL VOC datasets. The results show that our method can obtain
state-of-the-art performance on object classification, and very competitive
results on object discovery, with faster testing speed than competitors.Comment: Accepted by Pattern Recognitio
Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation
Pixel-level annotations are expensive and time consuming to obtain. Hence,
weak supervision using only image tags could have a significant impact in
semantic segmentation. Recently, CNN-based methods have proposed to fine-tune
pre-trained networks using image tags. Without additional information, this
leads to poor localization accuracy. This problem, however, was alleviated by
making use of objectness priors to generate foreground/background masks.
Unfortunately these priors either require pixel-level annotations/bounding
boxes, or still yield inaccurate object boundaries. Here, we propose a novel
method to extract accurate masks from networks pre-trained for the task of
object recognition, thus forgoing external objectness modules. We first show
how foreground/background masks can be obtained from the activations of
higher-level convolutional layers of a network. We then show how to obtain
multi-class masks by the fusion of foreground/background ones with information
extracted from a weakly-supervised localization network. Our experiments
evidence that exploiting these masks in conjunction with a weakly-supervised
training loss yields state-of-the-art tag-based weakly-supervised semantic
segmentation results.Comment: 14 pages, 11 figures, 8 tables, Accepted in IEEE Transaction on
Pattern Analysis and Machine Intelligence (IEEE TPAMI
Weakly Supervised Learning of Affordances
Localizing functional regions of objects or affordances is an important
aspect of scene understanding. In this work, we cast the problem of affordance
segmentation as that of semantic image segmentation. In order to explore
various levels of supervision, we introduce a pixel-annotated affordance
dataset of 3090 images containing 9916 object instances with rich contextual
information in terms of human-object interactions. We use a deep convolutional
neural network within an expectation maximization framework to take advantage
of weakly labeled data like image level annotations or keypoint annotations. We
show that a further reduction in supervision is possible with a minimal loss in
performance when human pose is used as context
Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images
The rapidly emerging field of computational pathology has the potential to
enable objective diagnosis, therapeutic response prediction and identification
of new morphological features of clinical relevance. However, deep
learning-based computational pathology approaches either require manual
annotation of gigapixel whole slide images (WSIs) in fully-supervised settings
or thousands of WSIs with slide-level labels in a weakly-supervised setting.
Moreover, whole slide level computational pathology methods also suffer from
domain adaptation and interpretability issues. These challenges have prevented
the broad adaptation of computational pathology for clinical and research
purposes. Here we present CLAM - Clustering-constrained attention multiple
instance learning, an easy-to-use, high-throughput, and interpretable WSI-level
processing and learning method that only requires slide-level labels while
being data efficient, adaptable and capable of handling multi-class subtyping
problems. CLAM is a deep-learning-based weakly-supervised method that uses
attention-based learning to automatically identify sub-regions of high
diagnostic value in order to accurately classify the whole slide, while also
utilizing instance-level clustering over the representative regions identified
to constrain and refine the feature space. In three separate analyses, we
demonstrate the data efficiency and adaptability of CLAM and its superior
performance over standard weakly-supervised classification. We demonstrate that
CLAM models are interpretable and can be used to identify well-known and new
morphological features. We further show that models trained using CLAM are
adaptable to independent test cohorts, cell phone microscopy images, and
biopsies. CLAM is a general-purpose and adaptable method that can be used for a
variety of different computational pathology tasks in both clinical and
research settings
Hand Pose Estimation through Semi-Supervised and Weakly-Supervised Learning
We propose a method for hand pose estimation based on a deep regressor
trained on two different kinds of input. Raw depth data is fused with an
intermediate representation in the form of a segmentation of the hand into
parts. This intermediate representation contains important topological
information and provides useful cues for reasoning about joint locations. The
mapping from raw depth to segmentation maps is learned in a
semi/weakly-supervised way from two different datasets: (i) a synthetic dataset
created through a rendering pipeline including densely labeled ground truth
(pixelwise segmentations); and (ii) a dataset with real images for which ground
truth joint positions are available, but not dense segmentations. Loss for
training on real images is generated from a patch-wise restoration process,
which aligns tentative segmentation maps with a large dictionary of synthetic
poses. The underlying premise is that the domain shift between synthetic and
real data is smaller in the intermediate representation, where labels carry
geometric and topological meaning, than in the raw input domain. Experiments on
the NYU dataset show that the proposed training method decreases error on
joints over direct regression of joints from depth data by 15.7%.Comment: 13 pages, 10 figures, 4 table
Salient Object Detection in the Deep Learning Era: An In-Depth Survey
As an essential problem in computer vision, salient object detection (SOD)
has attracted an increasing amount of research attention over the years. Recent
advances in SOD are predominantly led by deep learning-based solutions (named
deep SOD). To enable in-depth understanding of deep SOD, in this paper, we
provide a comprehensive survey covering various aspects, ranging from algorithm
taxonomy to unsolved issues. In particular, we first review deep SOD algorithms
from different perspectives, including network architecture, level of
supervision, learning paradigm, and object-/instance-level detection. Following
that, we summarize and analyze existing SOD datasets and evaluation metrics.
Then, we benchmark a large group of representative SOD models, and provide
detailed analyses of the comparison results. Moreover, we study the performance
of SOD algorithms under different attribute settings, which has not been
thoroughly explored previously, by constructing a novel SOD dataset with rich
attribute annotations covering various salient object types, challenging
factors, and scene categories. We further analyze, for the first time in the
field, the robustness of SOD models to random input perturbations and
adversarial attacks. We also look into the generalization and difficulty of
existing SOD datasets. Finally, we discuss several open issues of SOD and
outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our
constructed dataset with annotations, and codes for evaluation are publicly
available at \url{https://github.com/wenguanwang/SODsurvey
Keypoint Based Weakly Supervised Human Parsing
Fully convolutional networks (FCN) have achieved great success in human
parsing in recent years. In conventional human parsing tasks, pixel-level
labeling is required for guiding the training, which usually involves enormous
human labeling efforts. To ease the labeling efforts, we propose a novel weakly
supervised human parsing method which only requires simple object keypoint
annotations for learning. We develop an iterative learning method to generate
pseudo part segmentation masks from keypoint labels. With these pseudo masks,
we train an FCN network to output pixel-level human parsing predictions.
Furthermore, we develop a correlation network to perform joint prediction of
part and object segmentation masks and improve the segmentation performance.
The experiment results show that our weakly supervised method is able to
achieve very competitive human parsing results. Despite our method only uses
simple keypoint annotations for learning, we are able to achieve comparable
performance with fully supervised methods which use the expensive pixel-level
annotations
Distantly Supervised Road Segmentation
We present an approach for road segmentation that only requires image-level
annotations at training time. We leverage distant supervision, which allows us
to train our model using images that are different from the target domain.
Using large publicly available image databases as distant supervisors, we
develop a simple method to automatically generate weak pixel-wise road masks.
These are used to iteratively train a fully convolutional neural network, which
produces our final segmentation model. We evaluate our method on the Cityscapes
dataset, where we compare it with a fully supervised approach. Further, we
discuss the trade-off between annotation cost and performance. Overall, our
distantly supervised approach achieves 93.8% of the performance of the fully
supervised approach, while using orders of magnitude less annotation work.Comment: Accepted for ICCV workshop CVRSUAD201
- …