3,497 research outputs found
Mumford-Shah Loss Functional for Image Segmentation with Deep Learning
Recent state-of-the-art image segmentation algorithms are mostly based on
deep neural networks, thanks to their high performance and fast computation
time. However, these methods are usually trained in a supervised manner, which
requires large number of high quality ground-truth segmentation masks. On the
other hand, classical image segmentation approaches such as level-set methods
are formulated in a self-supervised manner by minimizing energy functions such
as Mumford-Shah functional, so they are still useful to help generation of
segmentation masks without labels. Unfortunately, these algorithms are usually
computationally expensive and often have limitation in semantic segmentation.
In this paper, we propose a novel loss function based on Mumford-Shah
functional that can be used in deep-learning based image segmentation without
or with small labeled data. This loss function is based on the observation that
the softmax layer of deep neural networks has striking similarity to the
characteristic function in the Mumford-Shah functional. We show that the new
loss function enables semi-supervised and unsupervised segmentation. In
addition, our loss function can be also used as a regularized function to
enhance supervised semantic segmentation algorithms. Experimental results on
multiple datasets demonstrate the effectiveness of the proposed method.Comment: Accepted for IEEE Transactions on Image Processin
Unsupervised learning from video to detect foreground objects in single images
Unsupervised learning from visual data is one of the most difficult
challenges in computer vision, being a fundamental task for understanding how
visual recognition works. From a practical point of view, learning from
unsupervised visual input has an immense practical value, as very large
quantities of unlabeled videos can be collected at low cost. In this paper, we
address the task of unsupervised learning to detect and segment foreground
objects in single images. We achieve our goal by training a student pathway,
consisting of a deep neural network. It learns to predict from a single input
image (a video frame) the output for that particular frame, of a teacher
pathway that performs unsupervised object discovery in video. Our approach is
different from the published literature that performs unsupervised discovery in
videos or in collections of images at test time. We move the unsupervised
discovery phase during the training stage, while at test time we apply the
standard feed-forward processing along the student pathway. This has a dual
benefit: firstly, it allows in principle unlimited possibilities of learning
and generalization during training, while remaining very fast at testing.
Secondly, the student not only becomes able to detect in single images
significantly better than its unsupervised video discovery teacher, but it also
achieves state of the art results on two important current benchmarks, YouTube
Objects and Object Discovery datasets. Moreover, at test time, our system is at
least two orders of magnitude faster than other previous methods
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Large-scale labeled data are generally required to train deep neural networks
in order to obtain better performance in visual feature learning from images or
videos for computer vision applications. To avoid extensive cost of collecting
and annotating large-scale datasets, as a subset of unsupervised learning
methods, self-supervised learning methods are proposed to learn general image
and video features from large-scale unlabeled data without using any
human-annotated labels. This paper provides an extensive review of deep
learning-based self-supervised general visual feature learning methods from
images or videos. First, the motivation, general pipeline, and terminologies of
this field are described. Then the common deep neural network architectures
that used for self-supervised learning are summarized. Next, the main
components and evaluation metrics of self-supervised learning methods are
reviewed followed by the commonly used image and video datasets and the
existing self-supervised visual feature learning methods. Finally, quantitative
performance comparisons of the reviewed methods on benchmark datasets are
summarized and discussed for both image and video feature learning. At last,
this paper is concluded and lists a set of promising future directions for
self-supervised visual feature learning
Sparse Autoencoder for Unsupervised Nucleus Detection and Representation in Histopathology Images
Histopathology images are crucial to the study of complex diseases such as
cancer. The histologic characteristics of nuclei play a key role in disease
diagnosis, prognosis and analysis. In this work, we propose a sparse
Convolutional Autoencoder (CAE) for fully unsupervised, simultaneous nucleus
detection and feature extraction in histopathology tissue images. Our CAE
detects and encodes nuclei in image patches in tissue images into sparse
feature maps that encode both the location and appearance of nuclei. Our CAE is
the first unsupervised detection network for computer vision applications. The
pretrained nucleus detection and feature extraction modules in our CAE can be
fine-tuned for supervised learning in an end-to-end fashion. We evaluate our
method on four datasets and reduce the errors of state-of-the-art methods up to
42%. We are able to achieve comparable performance with only 5% of the
fully-supervised annotation cost
Machine learning methods for histopathological image analysis
Abundant accumulation of digital histopathological images has led to the
increased demand for their analysis, such as computer-aided diagnosis using
machine learning techniques. However, digital pathological images and related
tasks have some issues to be considered. In this mini-review, we introduce the
application of digital pathological image analysis using machine learning
algorithms, address some problems specific to such analysis, and propose
possible solutions.Comment: 23 pages, 4 figure
SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud
Earlier work demonstrates the promise of deep-learning-based approaches for
point cloud segmentation; however, these approaches need to be improved to be
practically useful. To this end, we introduce a new model SqueezeSegV2 that is
more robust to dropout noise in LiDAR point clouds. With improved model
structure, training loss, batch normalization and additional input channel,
SqueezeSegV2 achieves significant accuracy improvement when trained on real
data. Training models for point cloud segmentation requires large amounts of
labeled point-cloud data, which is expensive to obtain. To sidestep the cost of
collection and annotation, simulators such as GTA-V can be used to create
unlimited amounts of labeled, synthetic data. However, due to domain shift,
models trained on synthetic data often do not generalize well to the real
world. We address this problem with a domain-adaptation training pipeline
consisting of three major components: 1) learned intensity rendering, 2)
geodesic correlation alignment, and 3) progressive domain calibration. When
trained on real data, our new model exhibits segmentation accuracy improvements
of 6.0-8.6% over the original SqueezeSeg. When training our new model on
synthetic data using the proposed domain adaptation pipeline, we nearly double
test accuracy on real-world data, from 29.0% to 57.4%. Our source code and
synthetic dataset will be open-sourced.Comment: Bichen Wu, Xuanyu Zhou, and Sicheng Zhao contributed equally to this
pape
An Efficient Evolutionary Based Method For Image Segmentation
The goal of this paper is to present a new efficient image segmentation
method based on evolutionary computation which is a model inspired from human
behavior. Based on this model, a four layer process for image segmentation is
proposed using the split/merge approach. In the first layer, an image is split
into numerous regions using the watershed algorithm. In the second layer, a
co-evolutionary process is applied to form centers of finals segments by
merging similar primary regions. In the third layer, a meta-heuristic process
uses two operators to connect the residual regions to their corresponding
determined centers. In the final layer, an evolutionary algorithm is used to
combine the resulted similar and neighbor regions. Different layers of the
algorithm are totally independent, therefore for certain applications a
specific layer can be changed without constraint of changing other layers. Some
properties of this algorithm like the flexibility of its method, the ability to
use different feature vectors for segmentation (grayscale, color, texture,
etc), the ability to control uniformity and the number of final segments using
free parameters and also maintaining small regions, makes it possible to apply
the algorithm to different applications. Moreover, the independence of each
region from other regions in the second layer, and the independence of centers
in the third layer, makes parallel implementation possible. As a result the
algorithm speed will increase. The presented algorithm was tested on a standard
dataset (BSDS 300) of images, and the region boundaries were compared with
different people segmentation contours. Results show the efficiency of the
algorithm and its improvement to similar methods. As an instance, in 70% of
tested images, results are better than ACT algorithm, besides in 100% of tested
images, we had better results in comparison with VSP algorithm.Comment: 17 page
Unsupervised End-to-end Learning for Deformable Medical Image Registration
We propose a registration algorithm for 2D CT/MRI medical images with a new
unsupervised end-to-end strategy using convolutional neural networks. The
contributions of our algorithm are threefold: (1) We transplant traditional
image registration algorithms to an end-to-end convolutional neural network
framework, while maintaining the unsupervised nature of image registration
problems. The image-to-image integrated framework can simultaneously learn both
image features and transformation matrix for registration. (2) Training with
additional data without any label can further improve the registration
performance by approximately 10 %. (3) The registration speed is 100x faster
than traditional methods. The proposed network is easy to implement and can be
trained efficiently. Experiments demonstrate that our system achieves
state-of-the-art results on 2D brain registration and achieves comparable
results on 2D liver registration. It can be extended to register other organs
beyond liver and brain such as kidney, lung, and heart
Universal Semi-Supervised Semantic Segmentation
In recent years, the need for semantic segmentation has arisen across several
different applications and environments. However, the expense and redundancy of
annotation often limits the quantity of labels available for training in any
domain, while deployment is easier if a single model works well across domains.
In this paper, we pose the novel problem of universal semi-supervised semantic
segmentation and propose a solution framework, to meet the dual needs of lower
annotation and deployment costs. In contrast to counterpoints such as fine
tuning, joint training or unsupervised domain adaptation, universal
semi-supervised segmentation ensures that across all domains: (i) a single
model is deployed, (ii) unlabeled data is used, (iii) performance is improved,
(iv) only a few labels are needed and (v) label spaces may differ. To address
this, we minimize supervised as well as within and cross-domain unsupervised
losses, introducing a novel feature alignment objective based on pixel-aware
entropy regularization for the latter. We demonstrate quantitative advantages
over other approaches on several combinations of segmentation datasets across
different geographies (Germany, England, India) and environments (outdoors,
indoors), as well as qualitative insights on the aligned representations.Comment: Accepted as poster presentation at ICCV 201
Exploring Object Relation in Mean Teacher for Cross-Domain Detection
Rendering synthetic data (e.g., 3D CAD-rendered images) to generate
annotations for learning deep models in vision tasks has attracted increasing
attention in recent years. However, simply applying the models learnt on
synthetic images may lead to high generalization error on real images due to
domain shift. To address this issue, recent progress in cross-domain
recognition has featured the Mean Teacher, which directly simulates
unsupervised domain adaptation as semi-supervised learning. The domain gap is
thus naturally bridged with consistency regularization in a teacher-student
scheme. In this work, we advance this Mean Teacher paradigm to be applicable
for cross-domain detection. Specifically, we present Mean Teacher with Object
Relations (MTOR) that novelly remolds Mean Teacher under the backbone of Faster
R-CNN by integrating the object relations into the measure of consistency cost
between teacher and student modules. Technically, MTOR firstly learns
relational graphs that capture similarities between pairs of regions for
teacher and student respectively. The whole architecture is then optimized with
three consistency regularizations: 1) region-level consistency to align the
region-level predictions between teacher and student, 2) inter-graph
consistency for matching the graph structures between teacher and student, and
3) intra-graph consistency to enhance the similarity between regions of same
class within the graph of student. Extensive experiments are conducted on the
transfers across Cityscapes, Foggy Cityscapes, and SIM10k, and superior results
are reported when comparing to state-of-the-art approaches. More remarkably, we
obtain a new record of single model: 22.8% of mAP on Syn2Real detection
dataset.Comment: CVPR 2019; The codes and model of our MTOR are publicly available at:
https://github.com/caiqi/mean-teacher-cross-domain-detectio
- …