92 research outputs found
End-to-End Localization and Ranking for Relative Attributes
We propose an end-to-end deep convolutional network to simultaneously
localize and rank relative visual attributes, given only weakly-supervised
pairwise image comparisons. Unlike previous methods, our network jointly learns
the attribute's features, localization, and ranker. The localization module of
our network discovers the most informative image region for the attribute,
which is then used by the ranking module to learn a ranking model of the
attribute. Our end-to-end framework also significantly speeds up processing and
is much faster than previous methods. We show state-of-the-art ranking results
on various relative attribute datasets, and our qualitative localization
results clearly demonstrate our network's ability to learn meaningful image
patches.Comment: Appears in European Conference on Computer Vision (ECCV), 201
DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image
3D reconstruction from a single image is a key problem in multiple
applications ranging from robotic manipulation to augmented reality. Prior
methods have tackled this problem through generative models which predict 3D
reconstructions as voxels or point clouds. However, these methods can be
computationally expensive and miss fine details. We introduce a new
differentiable layer for 3D data deformation and use it in DeformNet to learn a
model for 3D reconstruction-through-deformation. DeformNet takes an image
input, searches the nearest shape template from a database, and deforms the
template to match the query image. We evaluate our approach on the ShapeNet
dataset and show that - (a) the Free-Form Deformation layer is a powerful new
building block for Deep Learning models that manipulate 3D data (b) DeformNet
uses this FFD layer combined with shape retrieval for smooth and
detail-preserving 3D reconstruction of qualitatively plausible point clouds
with respect to a single query image (c) compared to other state-of-the-art 3D
reconstruction methods, DeformNet quantitatively matches or outperforms their
benchmarks by significant margins. For more information, visit:
https://deformnet-site.github.io/DeformNet-website/ .Comment: 11 pages, 9 figures, NIP
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
Pedestrian attribute recognition has been an emerging research topic in the
area of video surveillance. To predict the existence of a particular attribute,
it is demanded to localize the regions related to the attribute. However, in
this task, the region annotations are not available. How to carve out these
attribute-related regions remains challenging. Existing methods applied
attribute-agnostic visual attention or heuristic body-part localization
mechanisms to enhance the local feature representations, while neglecting to
employ attributes to define local feature areas. We propose a flexible
Attribute Localization Module (ALM) to adaptively discover the most
discriminative regions and learns the regional features for each attribute at
multiple levels. Moreover, a feature pyramid architecture is also introduced to
enhance the attribute-specific localization at low-levels with high-level
semantic guidance. The proposed framework does not require additional region
annotations and can be trained end-to-end with multi-level deep supervision.
Extensive experiments show that the proposed method achieves state-of-the-art
results on three pedestrian attribute datasets, including PETA, RAP, and
PA-100K.Comment: Accepted by ICCV 201
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation
Generating precise class-aware pseudo ground-truths, a.k.a, class activation
maps (CAMs), is essential for weakly-supervised semantic segmentation. The
original CAM method usually produces incomplete and inaccurate localization
maps. To tackle with this issue, this paper proposes an Expansion and Shrinkage
scheme based on the offset learning in the deformable convolution, to
sequentially improve the recall and precision of the located object in the two
respective stages. In the Expansion stage, an offset learning branch in a
deformable convolution layer, referred as "expansion sampler" seeks for
sampling increasingly less discriminative object regions, driven by an inverse
supervision signal that maximizes image-level classification loss. The located
more complete object in the Expansion stage is then gradually narrowed down to
the final object region during the Shrinkage stage. In the Shrinkage stage, the
offset learning branch of another deformable convolution layer, referred as
"shrinkage sampler", is introduced to exclude the false positive background
regions attended in the Expansion stage to improve the precision of the
localization maps. We conduct various experiments on PASCAL VOC 2012 and MS
COCO 2014 to well demonstrate the superiority of our method over other
state-of-the-art methods for weakly-supervised semantic segmentation. Code will
be made publicly available here https://github.com/TyroneLi/ESOL_WSSS.Comment: NeurIPS2022 accepte
Fine-Grained Image Analysis with Deep Learning: A Survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem
in computer vision and pattern recognition, and underpins a diverse set of
real-world applications. The task of FGIA targets analyzing visual objects from
subordinate categories, e.g., species of birds or models of cars. The small
inter-class and large intra-class variation inherent to fine-grained image
analysis makes it a challenging problem. Capitalizing on advances in deep
learning, in recent years we have witnessed remarkable progress in deep
learning powered FGIA. In this paper we present a systematic survey of these
advances, where we attempt to re-define and broaden the field of FGIA by
consolidating two fundamental fine-grained research areas -- fine-grained image
recognition and fine-grained image retrieval. In addition, we also review other
key issues of FGIA, such as publicly available benchmark datasets and related
domain-specific applications. We conclude by highlighting several research
directions and open problems which need further exploration from the community.Comment: Accepted by IEEE TPAM
- …