62,244 research outputs found
Representation Learning with Fine-grained Patterns
With the development of computational power and techniques for data
collection, deep learning demonstrates a superior performance over most of
existing algorithms on benchmark data sets. Many efforts have been devoted to
studying the mechanism of deep learning. One important observation is that deep
learning can learn the discriminative patterns from raw materials directly in a
task-dependent manner. Therefore, the representations obtained by deep learning
outperform hand-crafted features significantly. However, those patterns are
often learned from super-class labels due to a limited availability of
fine-grained labels, while fine-grained patterns are desired in many real-world
applications such as visual search in online shopping. To mitigate the
challenge, we propose an algorithm to learn the fine-grained patterns
sufficiently when only super-class labels are available. The effectiveness of
our method can be guaranteed with the theoretical analysis. Extensive
experiments on real-world data sets demonstrate that the proposed method can
significantly improve the performance on target tasks corresponding to
fine-grained classes, when only super-class information is available for
training
SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning
In fisheye images, rich distinct distortion patterns are regularly
distributed in the image plane. These distortion patterns are independent of
the visual content and provide informative cues for rectification. To make the
best of such rectification cues, we introduce SimFIR, a simple framework for
fisheye image rectification based on self-supervised representation learning.
Technically, we first split a fisheye image into multiple patches and extract
their representations with a Vision Transformer (ViT). To learn fine-grained
distortion representations, we then associate different image patches with
their specific distortion patterns based on the fisheye model, and further
subtly design an innovative unified distortion-aware pretext task for their
learning. The transfer performance on the downstream rectification task is
remarkably boosted, which verifies the effectiveness of the learned
representations. Extensive experiments are conducted, and the quantitative and
qualitative results demonstrate the superiority of our method over the
state-of-the-art algorithms as well as its strong generalization ability on
real-world fisheye images.Comment: Accepted to ICCV 202
GaitFM: Fine-grained Motion Representation for Gait Recognition
Gait recognition aims at identifying individual-specific walking patterns,
which is highly dependent on the observation of the different periodic
movements of each body part. However, most existing methods treat each part
equally and neglect the data redundancy due to the high sampling rate of gait
sequences. In this work, we propose a fine-grained motion representation
network (GaitFM) to improve gait recognition performance in three aspects.
First, a fine-grained part sequence learning (FPSL) module is designed to
explore part-independent spatio-temporal representations. Secondly, a
frame-wise compression strategy, called local motion aggregation (LMA), is used
to enhance motion variations. Finally, a weighted generalized mean pooling
(WGeM) layer works to adaptively keep more discriminative information in the
spatial downsampling. Experiments on two public datasets, CASIA-B and OUMVLP,
show that our approach reaches state-of-the-art performances. On the CASIA-B
dataset, our method achieves rank-1 accuracies of 98.0%, 95.7% and 87.9% for
normal walking, walking with a bag and walking with a coat, respectively. On
the OUMVLP dataset, our method achieved a rank-1 accuracy of 90.5%
Object Discovery From a Single Unlabeled Image by Mining Frequent Itemset With Multi-scale Features
TThe goal of our work is to discover dominant objects in a very general
setting where only a single unlabeled image is given. This is far more
challenge than typical co-localization or weakly-supervised localization tasks.
To tackle this problem, we propose a simple but effective pattern mining-based
method, called Object Location Mining (OLM), which exploits the advantages of
data mining and feature representation of pre-trained convolutional neural
networks (CNNs). Specifically, we first convert the feature maps from a
pre-trained CNN model into a set of transactions, and then discovers frequent
patterns from transaction database through pattern mining techniques. We
observe that those discovered patterns, i.e., co-occurrence highlighted
regions, typically hold appearance and spatial consistency. Motivated by this
observation, we can easily discover and localize possible objects by merging
relevant meaningful patterns. Extensive experiments on a variety of benchmarks
demonstrate that OLM achieves competitive localization performance compared
with the state-of-the-art methods. We also evaluate our approach compared with
unsupervised saliency detection methods and achieves competitive results on
seven benchmark datasets. Moreover, we conduct experiments on fine-grained
classification to show that our proposed method can locate the entire object
and parts accurately, which can benefit to improving the classification results
significantly
- …