1,467 research outputs found
Packing and Padding: Coupled Multi-index for Accurate Image Retrieval
In Bag-of-Words (BoW) based image retrieval, the SIFT visual word has a low
discriminative power, so false positive matches occur prevalently. Apart from
the information loss during quantization, another cause is that the SIFT
feature only describes the local gradient distribution. To address this
problem, this paper proposes a coupled Multi-Index (c-MI) framework to perform
feature fusion at indexing level. Basically, complementary features are coupled
into a multi-dimensional inverted index. Each dimension of c-MI corresponds to
one kind of feature, and the retrieval process votes for images similar in both
SIFT and other feature spaces. Specifically, we exploit the fusion of local
color feature into c-MI. While the precision of visual match is greatly
enhanced, we adopt Multiple Assignment to improve recall. The joint cooperation
of SIFT and color features significantly reduces the impact of false positive
matches.
Extensive experiments on several benchmark datasets demonstrate that c-MI
improves the retrieval accuracy significantly, while consuming only half of the
query time compared to the baseline. Importantly, we show that c-MI is well
complementary to many prior techniques. Assembling these methods, we have
obtained an mAP of 85.8% and N-S score of 3.85 on Holidays and Ukbench
datasets, respectively, which compare favorably with the state-of-the-arts.Comment: 8 pages, 7 figures, 6 tables. Accepted to CVPR 201
PoseScript: Linking 3D Human Poses and Natural Language
Natural language plays a critical role in many computer vision applications,
such as image captioning, visual question answering, and cross-modal retrieval,
to provide fine-grained semantic information. Unfortunately, while human pose
is key to human understanding, current 3D human pose datasets lack detailed
language descriptions. To address this issue, we have introduced the PoseScript
dataset. This dataset pairs more than six thousand 3D human poses from AMASS
with rich human-annotated descriptions of the body parts and their spatial
relationships. Additionally, to increase the size of the dataset to a scale
that is compatible with data-hungry learning algorithms, we have proposed an
elaborate captioning process that generates automatic synthetic descriptions in
natural language from given 3D keypoints. This process extracts low-level pose
information, known as "posecodes", using a set of simple but generic rules on
the 3D keypoints. These posecodes are then combined into higher level textual
descriptions using syntactic rules. With automatic annotations, the amount of
available data significantly scales up (100k), making it possible to
effectively pretrain deep models for finetuning on human captions. To showcase
the potential of annotated poses, we present three multi-modal learning tasks
that utilize the PoseScript dataset. Firstly, we develop a pipeline that maps
3D poses and textual descriptions into a joint embedding space, allowing for
cross-modal retrieval of relevant poses from large-scale datasets. Secondly, we
establish a baseline for a text-conditioned model generating 3D poses. Thirdly,
we present a learned process for generating pose descriptions. These
applications demonstrate the versatility and usefulness of annotated poses in
various tasks and pave the way for future research in the field.Comment: Extended version of the ECCV 2022 pape
Pentagon-Match (PMatch): Identification of View-Invariant Planar Feature for Local Feature Matching-Based Homography Estimation
In computer vision, finding correct point correspondence among images plays
an important role in many applications, such as image stitching, image
retrieval, visual localization, etc. Most of the research works focus on the
matching of local feature before a sampling method is employed, such as RANSAC,
to verify initial matching results via repeated fitting of certain global
transformation among the images. However, incorrect matches may still exist.
Thus, a novel sampling scheme, Pentagon-Match (PMatch), is proposed in this
work to verify the correctness of initially matched keypoints using pentagons
randomly sampled from them. By ensuring shape and location of these pentagons
are view-invariant with various evaluations of cross-ratio (CR), incorrect
matches of keypoint can be identified easily with homography estimated from
correctly matched pentagons. Experimental results show that highly accurate
estimation of homography can be obtained efficiently for planar scenes of the
HPatches dataset, based on keypoint matching results provided by LoFTR.
Besides, accurate outlier identification for the above matching results and
possible extension of the approach for multi-plane situation are also
demonstrated.Comment: arXiv admin note: text overlap with arXiv:2211.0300
Efficient large-scale image search with a vocabulary tree
The task of searching and recognizing objects in images has become an important research topic in the area of image processing and computer vision. Looking for similar images in large datasets given an input query and responding as fast as possible is a very challenging task. In this work the Bag of Features approach is studied, and an implementation of the visual vocabulary tree method from Nist´er and Stew´enius is presented. Images are described using local invariant descriptor techniques and then indexed in a database using an inverted index for further queries. The descriptors are quantized according to a visual vocabulary, creating sparse vectors, which allows to compute very efficiently, for each query, a ranking of similarity for indexed images. The performance of the method is analyzed varying different factors, such as the parameters for the vocabulary tree construction, different techniques of local descriptors extraction and dimensionality reduction with PCA. It can be observed that the retrieval performance increases with a richer vocabulary and decays very slowly as the size of the dataset grows.Fil: Uriza, Esteban. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Gómez Fernández, Francisco Roberto. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Rais, MartÃn. Escuela Normal Superior de Cachan; Franci
- …