1,279 research outputs found
Dual Skipping Networks
Inspired by the recent neuroscience studies on the left-right asymmetry of
the human brain in processing low and high spatial frequency information, this
paper introduces a dual skipping network which carries out coarse-to-fine
object categorization. Such a network has two branches to simultaneously deal
with both coarse and fine-grained classification tasks. Specifically, we
propose a layer-skipping mechanism that learns a gating network to predict
which layers to skip in the testing stage. This layer-skipping mechanism endows
the network with good flexibility and capability in practice. Evaluations are
conducted on several widely used coarse-to-fine object categorization
benchmarks, and promising results are achieved by our proposed network model.Comment: CVPR 2018 (poster); fix typ
Fine-Grained Object Recognition and Zero-Shot Learning in Remote Sensing Imagery
Fine-grained object recognition that aims to identify the type of an object
among a large number of subcategories is an emerging application with the
increasing resolution that exposes new details in image data. Traditional fully
supervised algorithms fail to handle this problem where there is low
between-class variance and high within-class variance for the classes of
interest with small sample sizes. We study an even more extreme scenario named
zero-shot learning (ZSL) in which no training example exists for some of the
classes. ZSL aims to build a recognition model for new unseen categories by
relating them to seen classes that were previously learned. We establish this
relation by learning a compatibility function between image features extracted
via a convolutional neural network and auxiliary information that describes the
semantics of the classes of interest by using training samples from the seen
classes. Then, we show how knowledge transfer can be performed for the unseen
classes by maximizing this function during inference. We introduce a new data
set that contains 40 different types of street trees in 1-ft spatial resolution
aerial data, and evaluate the performance of this model with manually annotated
attributes, a natural language model, and a scientific taxonomy as auxiliary
information. The experiments show that the proposed model achieves 14.3%
recognition accuracy for the classes with no training examples, which is
significantly better than a random guess accuracy of 6.3% for 16 test classes,
and three other ZSL algorithms.Comment: G. Sumbul, R. G. Cinbis, S. Aksoy, "Fine-Grained Object Recognition
and Zero-Shot Learning in Remote Sensing Imagery", IEEE Transactions on
Geoscience and Remote Sensing (TGRS), in press, 201
Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
Discriminative localization is essential for fine-grained image
classification task, which devotes to recognizing hundreds of subcategories in
the same basic-level category. Reflecting on discriminative regions of objects,
key differences among different subcategories are subtle and local. Existing
methods generally adopt a two-stage learning framework: The first stage is to
localize the discriminative regions of objects, and the second is to encode the
discriminative features for training classifiers. However, these methods
generally have two limitations: (1) Separation of the two-stage learning is
time-consuming. (2) Dependence on object and parts annotations for
discriminative localization learning leads to heavily labor-consuming labeling.
It is highly challenging to address these two important limitations
simultaneously. Existing methods only focus on one of them. Therefore, this
paper proposes the discriminative localization approach via saliency-guided
Faster R-CNN to address the above two limitations at the same time, and our
main novelties and advantages are: (1) End-to-end network based on Faster R-CNN
is designed to simultaneously localize discriminative regions and encode
discriminative features, which accelerates classification speed. (2)
Saliency-guided localization learning is proposed to localize the
discriminative region automatically, avoiding labor-consuming labeling. Both
are jointly employed to simultaneously accelerate classification speed and
eliminate dependence on object and parts annotations. Comparing with the
state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach
achieves both the best classification accuracy and efficiency.Comment: 9 pages, to appear in ACM MM 201
BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading
Diabetic retinopathy (DR) is a common retinal disease that leads to
blindness. For diagnosis purposes, DR image grading aims to provide automatic
DR grade classification, which is not addressed in conventional research
methods of binary DR image classification. Small objects in the eye images,
like lesions and microaneurysms, are essential to DR grading in medical
imaging, but they could easily be influenced by other objects. To address these
challenges, we propose a new deep learning architecture, called BiRA-Net, which
combines the attention model for feature extraction and bilinear model for
fine-grained classification. Furthermore, in considering the distance between
different grades of different DR categories, we propose a new loss function,
called grading loss, which leads to improved training convergence of the
proposed approach. Experimental results are provided to demonstrate the
superior performance of the proposed approach.Comment: Accepted at ICIP 201
Local Temporal Bilinear Pooling for Fine-grained Action Parsing
Fine-grained temporal action parsing is important in many applications, such
as daily activity understanding, human motion analysis, surgical robotics and
others requiring subtle and precise operations in a long-term period. In this
paper we propose a novel bilinear pooling operation, which is used in
intermediate layers of a temporal convolutional encoder-decoder net. In
contrast to other work, our proposed bilinear pooling is learnable and hence
can capture more complex local statistics than the conventional counterpart. In
addition, we introduce exact lower-dimension representations of our bilinear
forms, so that the dimensionality is reduced with neither information loss nor
extra computation. We perform intensive experiments to quantitatively analyze
our model and show the superior performances to other state-of-the-art work on
various datasets.Comment: 11 pages, 2 figures. Cam.
- …