1,510 research outputs found
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Multi-label image classification is a fundamental but challenging task
towards general visual understanding. Existing methods found the region-level
cues (e.g., features from RoIs) can facilitate multi-label classification.
Nevertheless, such methods usually require laborious object-level annotations
(i.e., object labels and bounding boxes) for effective learning of the
object-level visual features. In this paper, we propose a novel and efficient
deep framework to boost multi-label classification by distilling knowledge from
weakly-supervised detection task without bounding box annotations.
Specifically, given the image-level annotations, (1) we first develop a
weakly-supervised detection (WSD) model, and then (2) construct an end-to-end
multi-label image classification framework augmented by a knowledge
distillation module that guides the classification model by the WSD model
according to the class-level predictions for the whole image and the
object-level visual features for object RoIs. The WSD model is the teacher
model and the classification model is the student model. After this cross-task
knowledge distillation, the performance of the classification model is
significantly improved and the efficiency is maintained since the WSD model can
be safely discarded in the test phase. Extensive experiments on two large-scale
datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior
performances over the state-of-the-art methods on both performance and
efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table
You've Got Two Teachers: Co-evolutionary Image and Report Distillation for Semi-supervised Anatomical Abnormality Detection in Chest X-ray
Chest X-ray (CXR) anatomical abnormality detection aims at localizing and
characterising cardiopulmonary radiological findings in the radiographs, which
can expedite clinical workflow and reduce observational oversights. Most
existing methods attempted this task in either fully supervised settings which
demanded costly mass per-abnormality annotations, or weakly supervised settings
which still lagged badly behind fully supervised methods in performance. In
this work, we propose a co-evolutionary image and report distillation (CEIRD)
framework, which approaches semi-supervised abnormality detection in CXR by
grounding the visual detection results with text-classified abnormalities from
paired radiology reports, and vice versa. Concretely, based on the classical
teacher-student pseudo label distillation (TSD) paradigm, we additionally
introduce an auxiliary report classification model, whose prediction is used
for report-guided pseudo detection label refinement (RPDLR) in the primary
vision detection task. Inversely, we also use the prediction of the vision
detection model for abnormality-guided pseudo classification label refinement
(APCLR) in the auxiliary report classification task, and propose a co-evolution
strategy where the vision and report models mutually promote each other with
RPDLR and APCLR performed alternatively. To this end, we effectively
incorporate the weak supervision by reports into the semi-supervised TSD
pipeline. Besides the cross-modal pseudo label refinement, we further propose
an intra-image-modal self-adaptive non-maximum suppression, where the pseudo
detection labels generated by the teacher vision model are dynamically
rectified by high-confidence predictions by the student. Experimental results
on the public MIMIC-CXR benchmark demonstrate CEIRD's superior performance to
several up-to-date weakly and semi-supervised methods
The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework
In the context of label-efficient learning on video data, the distillation
method and the structural design of the teacher-student architecture have a
significant impact on knowledge distillation. However, the relationship between
these factors has been overlooked in previous research. To address this gap, we
propose a new weakly supervised learning framework for knowledge distillation
in video classification that is designed to improve the efficiency and accuracy
of the student model. Our approach leverages the concept of substage-based
learning to distill knowledge based on the combination of student substages and
the correlation of corresponding substages. We also employ the progressive
cascade training method to address the accuracy loss caused by the large
capacity gap between the teacher and the student. Additionally, we propose a
pseudo-label optimization strategy to improve the initial data label. To
optimize the loss functions of different distillation substages during the
training process, we introduce a new loss method based on feature distribution.
We conduct extensive experiments on both real and simulated data sets,
demonstrating that our proposed approach outperforms existing distillation
methods in terms of knowledge distillation for video classification tasks. Our
proposed substage-based distillation approach has the potential to inform
future research on label-efficient learning for video data
Learning New Classes from Limited Data in Image Segmentation and Object Detection
L'abstract è presente nell'allegato / the abstract is in the attachmen
WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
We study on weakly-supervised object detection (WSOD) which plays a vital
role in relieving human involvement from object-level annotations. Predominant
works integrate region proposal mechanisms with convolutional neural networks
(CNN). Although CNN is proficient in extracting discriminative local features,
grand challenges still exist to measure the likelihood of a bounding box
containing a complete object (i.e., "objectness"). In this paper, we propose a
novel WSOD framework with Objectness Distillation (i.e., WSOD^2) by designing a
tailored training mechanism for weakly-supervised object detection. Multiple
regression targets are specifically determined by jointly considering bottom-up
(BU) and top-down (TD) objectness from low-level measurement and CNN
confidences with an adaptive linear combination. As bounding box regression can
facilitate a region proposal learning to approach its regression target with
high objectness during training, deep objectness representation learned from
bottom-up evidences can be gradually distilled into CNN by optimization. We
explore different adaptive training curves for BU/TD objectness, and show that
the proposed WSOD^2 can achieve state-of-the-art results.Comment: Accepted as a ICCV 2019 poster pape
- …