406 research outputs found
Transferring CNNs to Multi-instance Multi-label Classification on Small Datasets
Image tagging is a well known challenge in image processing. It is typically addressed through multi-instance multi-label (MIML) classification methodologies. Convolutional Neural Networks (CNNs) possess great potential to perform well on MIML tasks, since multi-level convolution and max pooling coincide with the multi-instance setting and the sharing of hidden representation may benefit multi-label modeling. However, CNNs usually require a large amount of carefully labeled data for training, which is hard to obtain in many real applications. In this paper, we propose a new approach for transferring pre-trained deep networks such as VGG16 on Imagenet to small MIML tasks. We extract features from each group of the network layers and apply multiple binary classifiers to them for multi-label prediction. Moreover, we adopt an L1-norm regularized Logistic Regression (L1LR) to find the most effective features for learning the multi-label classifiers. The experiment results on two most-widely used and relatively small benchmark MIML image datasets demonstrate that the proposed approach can substantially outperform the state-of-the-art algorithms, in terms of all popular performance metrics
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Multi-label image classification is a fundamental but challenging task
towards general visual understanding. Existing methods found the region-level
cues (e.g., features from RoIs) can facilitate multi-label classification.
Nevertheless, such methods usually require laborious object-level annotations
(i.e., object labels and bounding boxes) for effective learning of the
object-level visual features. In this paper, we propose a novel and efficient
deep framework to boost multi-label classification by distilling knowledge from
weakly-supervised detection task without bounding box annotations.
Specifically, given the image-level annotations, (1) we first develop a
weakly-supervised detection (WSD) model, and then (2) construct an end-to-end
multi-label image classification framework augmented by a knowledge
distillation module that guides the classification model by the WSD model
according to the class-level predictions for the whole image and the
object-level visual features for object RoIs. The WSD model is the teacher
model and the classification model is the student model. After this cross-task
knowledge distillation, the performance of the classification model is
significantly improved and the efficiency is maintained since the WSD model can
be safely discarded in the test phase. Extensive experiments on two large-scale
datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior
performances over the state-of-the-art methods on both performance and
efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table
Self Paced Deep Learning for Weakly Supervised Object Detection
In a weakly-supervised scenario object detectors need to be trained using
image-level annotation alone. Since bounding-box-level ground truth is not
available, most of the solutions proposed so far are based on an iterative,
Multiple Instance Learning framework in which the current classifier is used to
select the highest-confidence boxes in each image, which are treated as
pseudo-ground truth in the next training iteration. However, the errors of an
immature classifier can make the process drift, usually introducing many of
false positives in the training dataset. To alleviate this problem, we propose
in this paper a training protocol based on the self-paced learning paradigm.
The main idea is to iteratively select a subset of images and boxes that are
the most reliable, and use them for training. While in the past few years
similar strategies have been adopted for SVMs and other classifiers, we are the
first showing that a self-paced approach can be used with deep-network-based
classifiers in an end-to-end training pipeline. The method we propose is built
on the fully-supervised Fast-RCNN architecture and can be applied to similar
architectures which represent the input image as a bag of boxes. We show
state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013.
On ILSVRC 2013 our results based on a low-capacity AlexNet network outperform
even those weakly-supervised approaches which are based on much higher-capacity
networks.Comment: To appear at IEEE Transactions on PAM
Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
A capsule is a group of neurons, whose activity vector represents the
instantiation parameters of a specific type of entity. In this paper, we
explore the capsule networks used for relation extraction in a multi-instance
multi-label learning framework and propose a novel neural approach based on
capsule networks with attention mechanisms. We evaluate our method with
different benchmarks, and it is demonstrated that our method improves the
precision of the predicted relations. Particularly, we show that capsule
networks improve multiple entity pairs relation extraction.Comment: To be published in EMNLP 201
- …