38 research outputs found
Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank
For many applications the collection of labeled data is expensive laborious.
Exploitation of unlabeled data during training is thus a long pursued objective
of machine learning. Self-supervised learning addresses this by positing an
auxiliary task (different, but related to the supervised task) for which data
is abundantly available. In this paper, we show how ranking can be used as a
proxy task for some regression problems. As another contribution, we propose an
efficient backpropagation technique for Siamese networks which prevents the
redundant computation introduced by the multi-branch network architecture. We
apply our framework to two regression problems: Image Quality Assessment (IQA)
and Crowd Counting. For both we show how to automatically generate ranked image
sets from unlabeled data. Our results show that networks trained to regress to
the ground truth targets for labeled data and to simultaneously learn to rank
unlabeled data obtain significantly better, state-of-the-art results for both
IQA and crowd counting. In addition, we show that measuring network uncertainty
on the self-supervised proxy task is a good measure of informativeness of
unlabeled data. This can be used to drive an algorithm for active learning and
we show that this reduces labeling effort by up to 50%.Comment: Accepted at TPAMI. (Keywords: Learning from rankings, image quality
assessment, crowd counting, active learning). arXiv admin note: text overlap
with arXiv:1803.0309
Universal Representation Learning from Multiple Domains for Few-shot Classification
In this paper, we look at the problem of few-shot classification that aims to
learn a classifier for previously unseen classes and domains from few labeled
samples. Recent methods use adaptation networks for aligning their features to
new domains or select the relevant features from multiple domain-specific
feature extractors. In this work, we propose to learn a single set of universal
deep representations by distilling knowledge of multiple separately trained
networks after co-aligning their features with the help of adapters and
centered kernel alignment. We show that the universal representations can be
further refined for previously unseen domains by an efficient adaptation step
in a similar spirit to distance learning methods. We rigorously evaluate our
model in the recent Meta-Dataset benchmark and demonstrate that it
significantly outperforms the previous methods while being more efficient. Our
code will be available at https://github.com/VICO-UoE/URL.Comment: Code will be available at https://github.com/VICO-UoE/UR
Learning Multiple Dense Prediction Tasks from Partially Annotated Data
Despite the recent advances in multi-task learning of dense prediction
problems, most methods rely on expensive labelled datasets. In this paper, we
present a label efficient approach and look at jointly learning of multiple
dense prediction tasks on partially annotated data (i.e. not all the task
labels are available for each image), which we call multi-task
partially-supervised learning. We propose a multi-task training procedure that
successfully leverages task relations to supervise its multi-task learning when
data is partially annotated. In particular, we learn to map each task pair to a
joint pairwise task-space which enables sharing information between them in a
computationally efficient way through another network conditioned on task
pairs, and avoids learning trivial cross-task relations by retaining high-level
information about the input image. We rigorously demonstrate that our proposed
method effectively exploits the images with unlabelled tasks and outperforms
existing semi-supervised learning approaches and related methods on three
standard benchmarks.Comment: CVPR2022, Multi-task Partially-supervised Learning, Code will be
available at https://github.com/VICO-UoE/MTPS
Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment
Currently, most image quality assessment (IQA) models are supervised by the
MAE or MSE loss with empirically slow convergence. It is well-known that
normalization can facilitate fast convergence. Therefore, we explore
normalization in the design of loss functions for IQA. Specifically, we first
normalize the predicted quality scores and the corresponding subjective quality
scores. Then, the loss is defined based on the norm of the differences between
these normalized values. The resulting "Norm-in-Norm'' loss encourages the IQA
model to make linear predictions with respect to subjective quality scores.
After training, the least squares regression is applied to determine the linear
mapping from the predicted quality to the subjective quality. It is shown that
the new loss is closely connected with two common IQA performance criteria
(PLCC and RMSE). Through theoretical analysis, it is proved that the embedded
normalization makes the gradients of the loss function more stable and more
predictable, which is conducive to the faster convergence of the IQA model.
Furthermore, to experimentally verify the effectiveness of the proposed loss,
it is applied to solve a challenging problem: quality assessment of in-the-wild
images. Experiments on two relevant datasets (KonIQ-10k and CLIVE) show that,
compared to MAE or MSE loss, the new loss enables the IQA model to converge
about 10 times faster and the final model achieves better performance. The
proposed model also achieves state-of-the-art prediction performance on this
challenging problem. For reproducible scientific research, our code is publicly
available at https://github.com/lidq92/LinearityIQA.Comment: Accepted by ACM MM 2020, + supplemental material
Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection
In incremental learning, replaying stored samples from previous tasks
together with current task samples is one of the most efficient approaches to
address catastrophic forgetting. However, unlike incremental classification,
image replay has not been successfully applied to incremental object detection
(IOD). In this paper, we identify the overlooked problem of foreground shift as
the main reason for this. Foreground shift only occurs when replaying images of
previous tasks and refers to the fact that their background might contain
foreground objects of the current task. To overcome this problem, a novel and
efficient Augmented Box Replay (ABR) method is developed that only stores and
replays foreground objects and thereby circumvents the foreground shift
problem. In addition, we propose an innovative Attentive RoI Distillation loss
that uses spatial attention from region-of-interest (RoI) features to constrain
current model to focus on the most important information from old model. ABR
significantly reduces forgetting of previous classes while maintaining high
plasticity in current classes. Moreover, it considerably reduces the storage
requirements when compared to standard image replay. Comprehensive experiments
on Pascal-VOC and COCO datasets support the state-of-the-art performance of our
model