1,591 research outputs found
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Exploiting synthetic data to learn deep models has attracted increasing
attention in recent years. However, the intrinsic domain difference between
synthetic and real images usually causes a significant performance drop when
applying the learned model to real world scenarios. This is mainly due to two
reasons: 1) the model overfits to synthetic images, making the convolutional
filters incompetent to extract informative representation for real images; 2)
there is a distribution difference between synthetic and real data, which is
also known as the domain adaptation problem. To this end, we propose a new
reality oriented adaptation approach for urban scene semantic segmentation by
learning from synthetic data. First, we propose a target guided distillation
approach to learn the real image style, which is achieved by training the
segmentation model to imitate a pretrained real style model using real images.
Second, we further take advantage of the intrinsic spatial structure presented
in urban scene images, and propose a spatial-aware adaptation scheme to
effectively align the distribution of two domains. These two modules can be
readily integrated with existing state-of-the-art semantic segmentation
networks to improve their generalizability when adapting from synthetic to real
urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting
from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness
of our method.Comment: Add experiments on SYNTHIA, CVPR 2018 camera-ready versio
BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation
Current knowledge distillation approaches in semantic segmentation tend to
adopt a holistic approach that treats all spatial locations equally. However,
for dense prediction, students' predictions on edge regions are highly
uncertain due to contextual information leakage, requiring higher spatial
sensitivity knowledge than the body regions. To address this challenge, this
paper proposes a novel approach called boundary-privileged knowledge
distillation (BPKD). BPKD distills the knowledge of the teacher model's body
and edges separately to the compact student model. Specifically, we employ two
distinct loss functions: (i) edge loss, which aims to distinguish between
ambiguous classes at the pixel level in edge regions; (ii) body loss, which
utilizes shape constraints and selectively attends to the inner-semantic
regions. Our experiments demonstrate that the proposed BPKD method provides
extensive refinements and aggregation for edge and body regions. Additionally,
the method achieves state-of-the-art distillation performance for semantic
segmentation on three popular benchmark datasets, highlighting its
effectiveness and generalization ability. BPKD shows consistent improvements
across a diverse array of lightweight segmentation structures, including both
CNNs and transformers, underscoring its architecture-agnostic adaptability. The
code is available at \url{https://github.com/AkideLiu/BPKD}.Comment: 17 pages, 9 figures, 9 table
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection
Multi-label image classification is a fundamental but challenging task
towards general visual understanding. Existing methods found the region-level
cues (e.g., features from RoIs) can facilitate multi-label classification.
Nevertheless, such methods usually require laborious object-level annotations
(i.e., object labels and bounding boxes) for effective learning of the
object-level visual features. In this paper, we propose a novel and efficient
deep framework to boost multi-label classification by distilling knowledge from
weakly-supervised detection task without bounding box annotations.
Specifically, given the image-level annotations, (1) we first develop a
weakly-supervised detection (WSD) model, and then (2) construct an end-to-end
multi-label image classification framework augmented by a knowledge
distillation module that guides the classification model by the WSD model
according to the class-level predictions for the whole image and the
object-level visual features for object RoIs. The WSD model is the teacher
model and the classification model is the student model. After this cross-task
knowledge distillation, the performance of the classification model is
significantly improved and the efficiency is maintained since the WSD model can
be safely discarded in the test phase. Extensive experiments on two large-scale
datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior
performances over the state-of-the-art methods on both performance and
efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table
- …