3,524 research outputs found
Scene Graph Embeddings Using Relative Similarity Supervision
Scene graphs are a powerful structured representation of the underlying
content of images, and embeddings derived from them have been shown to be
useful in multiple downstream tasks. In this work, we employ a graph
convolutional network to exploit structure in scene graphs and produce image
embeddings useful for semantic image retrieval. Different from
classification-centric supervision traditionally available for learning image
representations, we address the task of learning from relative similarity
labels in a ranking context. Rooted within the contrastive learning paradigm,
we propose a novel loss function that operates on pairs of similar and
dissimilar images and imposes relative ordering between them in embedding
space. We demonstrate that this Ranking loss, coupled with an intuitive triple
sampling strategy, leads to robust representations that outperform well-known
contrastive losses on the retrieval task. In addition, we provide qualitative
evidence of how retrieved results that utilize structured scene information
capture the global context of the scene, different from visual similarity
search.Comment: Accepted to AAAI 202
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
Learning and Matching Multi-View Descriptors for Registration of Point Clouds
Critical to the registration of point clouds is the establishment of a set of
accurate correspondences between points in 3D space. The correspondence problem
is generally addressed by the design of discriminative 3D local descriptors on
the one hand, and the development of robust matching strategies on the other
hand. In this work, we first propose a multi-view local descriptor, which is
learned from the images of multiple views, for the description of 3D keypoints.
Then, we develop a robust matching approach, aiming at rejecting outlier
matches based on the efficient inference via belief propagation on the defined
graphical model. We have demonstrated the boost of our approaches to
registration on the public scanning and multi-view stereo datasets. The
superior performance has been verified by the intensive comparisons against a
variety of descriptors and matching methods
Cross-View Image Matching for Geo-localization in Urban Environments
In this paper, we address the problem of cross-view image geo-localization.
Specifically, we aim to estimate the GPS location of a query street view image
by finding the matching images in a reference database of geo-tagged bird's eye
view images, or vice versa. To this end, we present a new framework for
cross-view image geo-localization by taking advantage of the tremendous success
of deep convolutional neural networks (CNNs) in image classification and object
detection. First, we employ the Faster R-CNN to detect buildings in the query
and reference images. Next, for each building in the query image, we retrieve
the nearest neighbors from the reference buildings using a Siamese network
trained on both positive matching image pairs and negative pairs. To find the
correct NN for each query building, we develop an efficient multiple nearest
neighbors matching method based on dominant sets. We evaluate the proposed
framework on a new dataset that consists of pairs of street view and bird's eye
view images. Experimental results show that the proposed method achieves better
geo-localization accuracy than other approaches and is able to generalize to
images at unseen locations
- …