15,969 research outputs found
Learning to Compare Image Patches via Convolutional Neural Networks
CVPR 2015International audienceIn this paper we show how to learn directly from image data (i.e., without resorting to manually-designed features) a general similarity function for comparing image patches, which is a task of fundamental importance for many computer vision problems. To encode such a function, we opt for a CNN-based model that is trained to account for a wide variety of changes in image appearance. To that end, we explore and study multiple neural network architectures, which are specifically adapted to this task. We show that such an approach can significantly outperform the state-of-the-art on several problems and benchmark datasets
A critical analysis of self-supervision, or what we can learn from a single image
We look critically at popular self-supervision techniques for learning deep
convolutional neural networks without manual labels. We show that three
different and representative methods, BiGAN, RotNet and DeepCluster, can learn
the first few layers of a convolutional network from a single image as well as
using millions of images and manual labels, provided that strong data
augmentation is used. However, for deeper layers the gap with manual
supervision cannot be closed even if millions of unlabelled images are used for
training. We conclude that: (1) the weights of the early layers of deep
networks contain limited information about the statistics of natural images,
that (2) such low-level statistics can be learned through self-supervision just
as well as through strong supervision, and that (3) the low-level statistics
can be captured via synthetic transformations instead of using a large image
dataset.Comment: Accepted paper at the International Conference on Learning
Representations (ICLR) 202
DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks
In this paper, we propose DeepCut, a method to obtain pixelwise object
segmentations given an image dataset labelled with bounding box annotations. It
extends the approach of the well-known GrabCut method to include machine
learning by training a neural network classifier from bounding box annotations.
We formulate the problem as an energy minimisation problem over a
densely-connected conditional random field and iteratively update the training
targets to obtain pixelwise object segmentations. Additionally, we propose
variants of the DeepCut method and compare those to a naive approach to CNN
training under weak supervision. We test its applicability to solve brain and
lung segmentation problems on a challenging fetal magnetic resonance dataset
and obtain encouraging results in terms of accuracy
Lesion detection and Grading of Diabetic Retinopathy via Two-stages Deep Convolutional Neural Networks
We propose an automatic diabetic retinopathy (DR) analysis algorithm based on
two-stages deep convolutional neural networks (DCNN). Compared to existing
DCNN-based DR detection methods, the proposed algorithm have the following
advantages: (1) Our method can point out the location and type of lesions in
the fundus images, as well as giving the severity grades of DR. Moreover, since
retina lesions and DR severity appear with different scales in fundus images,
the integration of both local and global networks learn more complete and
specific features for DR analysis. (2) By introducing imbalanced weighting map,
more attentions will be given to lesion patches for DR grading, which
significantly improve the performance of the proposed algorithm. In this study,
we label 12,206 lesion patches and re-annotate the DR grades of 23,595 fundus
images from Kaggle competition dataset. Under the guidance of clinical
ophthalmologists, the experimental results show that our local lesion detection
net achieve comparable performance with trained human observers, and the
proposed imbalanced weighted scheme also be proved to significantly improve the
capability of our DCNN-based DR grading algorithm
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
Unsupervised feature learning by augmenting single images
When deep learning is applied to visual object recognition, data augmentation
is often used to generate additional training data without extra labeling cost.
It helps to reduce overfitting and increase the performance of the algorithm.
In this paper we investigate if it is possible to use data augmentation as the
main component of an unsupervised feature learning architecture. To that end we
sample a set of random image patches and declare each of them to be a separate
single-image surrogate class. We then extend these trivial one-element classes
by applying a variety of transformations to the initial 'seed' patches. Finally
we train a convolutional neural network to discriminate between these surrogate
classes. The feature representation learned by the network can then be used in
various vision tasks. We find that this simple feature learning algorithm is
surprisingly successful, achieving competitive classification results on
several popular vision datasets (STL-10, CIFAR-10, Caltech-101).Comment: ICLR 2014 workshop track submission (7 pages, 4 figures, 1 table
- …