99,503 research outputs found
Multimodal Convolutional Neural Networks for Matching Image and Sentence
In this paper, we propose multimodal convolutional neural networks (m-CNNs)
for matching image and sentence. Our m-CNN provides an end-to-end framework
with convolutional architectures to exploit image representation, word
composition, and the matching relations between the two modalities. More
specifically, it consists of one image CNN encoding the image content, and one
matching CNN learning the joint representation of image and sentence. The
matching CNN composes words to different semantic fragments and learns the
inter-modal relations between image and the composed fragments at different
levels, thus fully exploit the matching relations between image and sentence.
Experimental results on benchmark databases of bidirectional image and sentence
retrieval demonstrate that the proposed m-CNNs can effectively capture the
information necessary for image and sentence matching. Specifically, our
proposed m-CNNs for bidirectional image and sentence retrieval on Flickr30K and
Microsoft COCO databases achieve the state-of-the-art performances.Comment: Accepted by ICCV 201
Efficient and Accurate Co-Visible Region Localization with Matching Key-Points Crop (MKPC): A Two-Stage Pipeline for Enhancing Image Matching Performance
Image matching is a classic and fundamental task in computer vision. In this
paper, under the hypothesis that the areas outside the co-visible regions carry
little information, we propose a matching key-points crop (MKPC) algorithm. The
MKPC locates, proposes and crops the critical regions, which are the co-visible
areas with great efficiency and accuracy. Furthermore, building upon MKPC, we
propose a general two-stage pipeline for image matching, which is compatible to
any image matching models or combinations. We experimented with plugging
SuperPoint + SuperGlue into the two-stage pipeline, whose results show that our
method enhances the performance for outdoor pose estimations. What's more, in a
fair comparative condition, our method outperforms the SOTA on Image Matching
Challenge 2022 Benchmark, which represents the hardest outdoor benchmark of
image matching currently.Comment: 9 pages with 6 figures. Many experiments have not yet been conducted,
the theoretical sections are rather concise, and the references are not
adequately comprehensive. This version of the paper is being released to make
this work public, and code will also be published soon. We will continue to
conduct additional experiments and periodically update the pape
Adversarial Training for Adverse Conditions: Robust Metric Localisation using Appearance Transfer
We present a method of improving visual place recognition and metric
localisation under very strong appear- ance change. We learn an invertable
generator that can trans- form the conditions of images, e.g. from day to
night, summer to winter etc. This image transforming filter is explicitly
designed to aid and abet feature-matching using a new loss based on SURF
detector and dense descriptor maps. A network is trained to output synthetic
images optimised for feature matching given only an input RGB image, and these
generated images are used to localize the robot against a previously built map
using traditional sparse matching approaches. We benchmark our results using
multiple traversals of the Oxford RobotCar Dataset over a year-long period,
using one traversal as a map and the other to localise. We show that this
method significantly improves place recognition and localisation under changing
and adverse conditions, while reducing the number of mapping runs needed to
successfully achieve reliable localisation.Comment: Accepted at ICRA201
- …