120 research outputs found
PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition
Unlike its image based counterpart, point cloud based retrieval for place
recognition has remained as an unexplored and unsolved problem. This is largely
due to the difficulty in extracting local feature descriptors from a point
cloud that can subsequently be encoded into a global descriptor for the
retrieval task. In this paper, we propose the PointNetVLAD where we leverage on
the recent success of deep networks to solve point cloud based retrieval for
place recognition. Specifically, our PointNetVLAD is a combination/modification
of the existing PointNet and NetVLAD, which allows end-to-end training and
inference to extract the global descriptor from a given 3D point cloud.
Furthermore, we propose the "lazy triplet and quadruplet" loss functions that
can achieve more discriminative and generalizable global descriptors to tackle
the retrieval task. We create benchmark datasets for point cloud based
retrieval for place recognition, and the experimental results on these datasets
show the feasibility of our PointNetVLAD. Our code and the link for the
benchmark dataset downloads are available in our project website.
http://github.com/mikacuy/pointnetvlad/Comment: CVPR 2018, 11 pages, 10 figure
Deep Shape Matching
We cast shape matching as metric learning with convolutional networks. We
break the end-to-end process of image representation into two parts. Firstly,
well established efficient methods are chosen to turn the images into edge
maps. Secondly, the network is trained with edge maps of landmark images, which
are automatically obtained by a structure-from-motion pipeline. The learned
representation is evaluated on a range of different tasks, providing
improvements on challenging cases of domain generalization, generic
sketch-based image retrieval or its fine-grained counterpart. In contrast to
other methods that learn a different model per task, object category, or
domain, we use the same network throughout all our experiments, achieving
state-of-the-art results in multiple benchmarks.Comment: ECCV 201
SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud based Place Recognition
We tackle the problem of place recognition from point cloud data and
introduce a self-attention and orientation encoding network (SOE-Net) that
fully explores the relationship between points and incorporates long-range
context into point-wise local descriptors. Local information of each point from
eight orientations is captured in a PointOE module, whereas long-range feature
dependencies among local descriptors are captured with a self-attention unit.
Moreover, we propose a novel loss function called Hard Positive Hard Negative
quadruplet loss (HPHN quadruplet), that achieves better performance than the
commonly used metric learning loss. Experiments on various benchmark datasets
demonstrate promising performance of the proposed network. It significantly
outperforms the current state-of-the-art approaches - the average recall at top
1 retrieval on the Oxford RobotCar dataset is improved by over 16%. Codes and
the trained model will be made publicly available.Comment: 10 pages, 7 figures, 6 table
In Defense of the Classification Loss for Person Re-Identification
The recent research for person re-identification has been focused on two
trends. One is learning the part-based local features to form more informative
feature descriptors. The other is designing effective metric learning loss
functions such as the triplet loss family. We argue that learning global
features with classification loss could achieve the same goal, even with some
simple and cost-effective architecture design. In this paper, we first explain
why the person re-id framework with standard classification loss usually has
inferior performance compared to metric learning. Based on that, we further
propose a person re-id framework featured by channel grouping and multi-branch
strategy, which divides global features into multiple channel groups and learns
the discriminative channel group features by multi-branch classification
layers. The extensive experiments show that our framework outperforms prior
state-of-the-arts in terms of both accuracy and inference speed
Image-Based Geo-Localization Using Satellite Imagery
The problem of localization on a geo-referenced satellite map given a query
ground view image is useful yet remains challenging due to the drastic change
in viewpoint. To this end, in this paper we work on the extension of our
earlier work on the Cross-View Matching Network (CVM-Net) for the
ground-to-aerial image matching task since the traditional image descriptors
fail due to the drastic viewpoint change. In particular, we show more extensive
experimental results and analyses of the network architecture on our CVM-Net.
Furthermore, we propose a Markov localization framework that enforces the
temporal consistency between image frames to enhance the geo-localization
results in the case where a video stream of ground view images is available.
Experimental results show that our proposed Markov localization framework can
continuously localize the vehicle within a small error on our Singapore
dataset.Comment: IJCV preprin
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification
Although the performance of person Re-Identification (ReID) has been
significantly boosted, many challenging issues in real scenarios have not been
fully investigated, e.g., the complex scenes and lighting variations, viewpoint
and pose changes, and the large number of identities in a camera network. To
facilitate the research towards conquering those issues, this paper contributes
a new dataset called MSMT17 with many important features, e.g., 1) the raw
videos are taken by an 15-camera network deployed in both indoor and outdoor
scenes, 2) the videos cover a long period of time and present complex lighting
variations, and 3) it contains currently the largest number of annotated
identities, i.e., 4,101 identities and 126,441 bounding boxes. We also observe
that, domain gap commonly exists between datasets, which essentially causes
severe performance drop when training and testing on different datasets. This
results in that available training data cannot be effectively leveraged for new
testing domains. To relieve the expensive costs of annotating new training
samples, we propose a Person Transfer Generative Adversarial Network (PTGAN) to
bridge the domain gap. Comprehensive experiments show that the domain gap could
be substantially narrowed-down by the PTGAN.Comment: 10 pages, 9 figures; accepted in CVPR 201
Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
We consider the problem of 3D object pose estimation. While much recent work
has focused on the RGB domain, the reliance on accurately annotated images
limits their generalizability and scalability. On the other hand, the easily
available CAD models of objects are rich sources of data, providing a large
number of synthetically rendered images. In this paper, we solve this key
problem of existing methods requiring expensive 3D pose annotations by
proposing a new method that matches RGB images to CAD models for object pose
estimation. Our key innovations compared to existing work include removing the
need for either real-world textures for CAD models or explicit 3D pose
annotations for RGB images. We achieve this through a series of objectives that
learn how to select keypoints and enforce viewpoint and modality invariance
across RGB images and CAD model renderings. We conduct extensive experiments to
demonstrate that the proposed method can reliably estimate object pose in RGB
images, as well as generalize to object instances not seen during training.Comment: 10 pages, 6 figures, 4 tables, ICCV 201
Place recognition survey: An update on deep learning approaches
Autonomous Vehicles (AV) are becoming more capable of navigating in complex
environments with dynamic and changing conditions. A key component that enables
these intelligent vehicles to overcome such conditions and become more
autonomous is the sophistication of the perception and localization systems. As
part of the localization system, place recognition has benefited from recent
developments in other perception tasks such as place categorization or object
recognition, namely with the emergence of deep learning (DL) frameworks. This
paper surveys recent approaches and methods used in place recognition,
particularly those based on deep learning. The contributions of this work are
twofold: surveying recent sensors such as 3D LiDARs and RADARs, applied in
place recognition; and categorizing the various DL-based place recognition
works into supervised, unsupervised, semi-supervised, parallel, and
hierarchical categories. First, this survey introduces key place recognition
concepts to contextualize the reader. Then, sensor characteristics are
addressed. This survey proceeds by elaborating on the various DL-based works,
presenting summaries for each framework. Some lessons learned from this survey
include: the importance of NetVLAD for supervised end-to-end learning; the
advantages of unsupervised approaches in place recognition, namely for
cross-domain applications; or the increasing tendency of recent works to seek,
not only for higher performance but also for higher efficiency.Comment: Under review in IEEE Transactions on Intelligent Vehicles. This work
was submitted on the 13/01/2021 to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. Upon acceptance of the article by IEEE, the preprint
article will be replaced with the accepted versio
LiDAR-Based Place Recognition For Autonomous Driving: A Survey
LiDAR-based place recognition (LPR) plays a pivotal role in autonomous
driving, which assists Simultaneous Localization and Mapping (SLAM) systems in
reducing accumulated errors and achieving reliable localization. However,
existing reviews predominantly concentrate on visual place recognition (VPR)
methods. Despite the recent remarkable progress in LPR, to the best of our
knowledge, there is no dedicated systematic review in this area. This paper
bridges the gap by providing a comprehensive review of place recognition
methods employing LiDAR sensors, thus facilitating and encouraging further
research. We commence by delving into the problem formulation of place
recognition, exploring existing challenges, and describing relations to
previous surveys. Subsequently, we conduct an in-depth review of related
research, which offers detailed classifications, strengths and weaknesses, and
architectures. Finally, we summarize existing datasets, commonly used
evaluation metrics, and comprehensive evaluation results from various methods
on public datasets. This paper can serve as a valuable tutorial for newcomers
entering the field of place recognition and for researchers interested in
long-term robot localization. We pledge to maintain an up-to-date project on
our website https://github.com/ShiPC-AI/LPR-Survey.Comment: 26 pages,13 figures, 5 table
Neural Signatures for Licence Plate Re-identification
The problem of vehicle licence plate re-identification is generally
considered as a one-shot image retrieval problem. The objective of this task is
to learn a feature representation (called a "signature") for licence plates.
Incoming licence plate images are converted to signatures and matched to a
previously collected template database through a distance measure. Then, the
input image is recognized as the template whose signature is "nearest" to the
input signature. The template database is restricted to contain only a single
signature per unique licence plate for our problem.
We measure the performance of deep convolutional net-based features adapted
from face recognition on this task. In addition, we also test a hybrid approach
combining the Fisher vector with a neural network-based embedding called "f2nn"
trained with the Triplet loss function. We find that the hybrid approach
performs comparably while providing computational benefits. The signature
generated by the hybrid approach also shows higher generalizability to datasets
more dissimilar to the training corpus
- …