3 research outputs found
Classification Calibration for Long-tail Instance Segmentation
Remarkable progress has been made in object instance detection and
segmentation in recent years. However, existing state-of-the-art methods are
mostly evaluated with fairly balanced and class-limited benchmarks, such as
Microsoft COCO dataset [8]. In this report, we investigate the performance drop
phenomenon of state-of-the-art two-stage instance segmentation models when
processing extreme long-tail training data based on the LVIS [5] dataset, and
find a major cause is the inaccurate classification of object proposals. Based
on this observation, we propose to calibrate the prediction of classification
head to improve recognition performance for the tail classes. Without much
additional cost and modification of the detection model architecture, our
calibration method improves the performance of the baseline by a large margin
on the tail classes. Codes will be available. Importantly, after the
submission, we find significant improvement can be further achieved by
modifying the calibration head, which we will update later.Comment: This report presents our winning solution to LVIS 2019 challeng
The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation
Most existing object instance detection and segmentation models only work
well on fairly balanced benchmarks where per-category training sample numbers
are comparable, such as COCO. They tend to suffer performance drop on realistic
datasets that are usually long-tailed. This work aims to study and address such
open challenges. Specifically, we systematically investigate performance drop
of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the
recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate
classification of object proposals. Based on such an observation, we first
consider various techniques for improving long-tail classification performance
which indeed enhance instance segmentation results. We then propose a simple
calibration framework to more effectively alleviate classification head bias
with a bi-level class balanced sampling approach. Without bells and whistles,
it significantly boosts the performance of instance segmentation for tail
classes on the recent LVIS dataset and our sampled COCO-LT dataset. Our
analysis provides useful insights for solving long-tail instance detection and
segmentation problems, and the straightforward \emph{SimCal} method can serve
as a simple but strong baseline. With the method we have won the 2019 LVIS
challenge. Codes and models are available at https://github.com/twangnh/SimCal.Comment: LVIS 2019 challenge winner, performance significantly improved after
challenge submission, accepted at ECCV 202
Background Splitting: Finding Rare Classes in a Sea of Background
We focus on the real-world problem of training accurate deep models for image
classification of a small number of rare categories. In these scenarios, almost
all images belong to the background category in the dataset (>95% of the
dataset is background). We demonstrate that both standard fine-tuning
approaches and state-of-the-art approaches for training on imbalanced datasets
do not produce accurate deep models in the presence of this extreme imbalance.
Our key observation is that the extreme imbalance due to the background
category can be drastically reduced by leveraging visual knowledge from an
existing pre-trained model. Specifically, the background category is "split"
into smaller and more coherent pseudo-categories during training using a
pre-trained model. We incorporate background splitting into an image
classification model by adding an auxiliary loss that learns to mimic the
predictions of the existing, pre-trained image classification model. Note that
this process is automatic and requires no additional manual labels. The
auxiliary loss regularizes the feature representation of the shared network
trunk by requiring it to discriminate between previously homogeneous background
instances and reduces overfitting to the small number of rare category
positives. We also show that BG splitting can be combined with other background
imbalance methods to further improve performance. We evaluate our method on a
modified version of the iNaturalist dataset where only a small subset of rare
category labels are available during training (all other images are labeled as
background). By jointly learning to recognize ImageNet categories and selected
iNaturalist categories, our approach yields performance that is 42.3 mAP points
higher than a fine-tuning baseline when 99.98% of the data is background, and
8.3 mAP points higher than SotA baselines when 98.30% of the data is
background