12,191 research outputs found
Memory vectors for similarity search in high-dimensional spaces
We study an indexing architecture to store and search in a database of
high-dimensional vectors from the perspective of statistical signal processing
and decision theory. This architecture is composed of several memory units,
each of which summarizes a fraction of the database by a single representative
vector. The potential similarity of the query to one of the vectors stored in
the memory unit is gauged by a simple correlation with the memory unit's
representative vector. This representative optimizes the test of the following
hypothesis: the query is independent from any vector in the memory unit vs. the
query is a simple perturbation of one of the stored vectors.
Compared to exhaustive search, our approach finds the most similar database
vectors significantly faster without a noticeable reduction in search quality.
Interestingly, the reduction of complexity is provably better in
high-dimensional spaces. We empirically demonstrate its practical interest in a
large-scale image search scenario with off-the-shelf state-of-the-art
descriptors.Comment: Accepted to IEEE Transactions on Big Dat
Strategies for Searching Video Content with Text Queries or Video Examples
The large number of user-generated videos uploaded on to the Internet
everyday has led to many commercial video search engines, which mainly rely on
text metadata for search. However, metadata is often lacking for user-generated
videos, thus these videos are unsearchable by current search engines.
Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity
problem by directly analyzing the visual and audio streams of each video. CBVR
encompasses multiple research topics, including low-level feature design,
feature fusion, semantic detector training and video search/reranking. We
present novel strategies in these topics to enhance CBVR in both accuracy and
speed under different query inputs, including pure textual queries and query by
video examples. Our proposed strategies have been incorporated into our
submission for the TRECVID 2014 Multimedia Event Detection evaluation, where
our system outperformed other submissions in both text queries and video
example queries, thus demonstrating the effectiveness of our proposed
approaches
Continual Learning in Open-vocabulary Classification with Complementary Memory Systems
We introduce a method for flexible continual learning in open-vocabulary
image classification, drawing inspiration from the complementary learning
systems observed in human cognition. We propose a "tree probe" method, an
adaption of lazy learning principles, which enables fast learning from new
examples with competitive accuracy to batch-trained linear models. Further, we
propose a method to combine predictions from a CLIP zero-shot model and the
exemplar-based model, using the zero-shot estimated probability that a sample's
class is within any of the exemplar classes. We test in data incremental, class
incremental, and task incremental settings, as well as ability to perform
flexible inference on varying subsets of zero-shot and learned categories. Our
proposed method achieves a good balance of learning speed, target task
effectiveness, and zero-shot effectiveness.Comment: In revie
Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor
SfM (Structure from Motion) has been extensively used for UAV (Unmanned
Aerial Vehicle) image orientation. Its efficiency is directly influenced by
feature matching. Although image retrieval has been extensively used for match
pair selection, high computational costs are consumed due to a large number of
local features and the large size of the used codebook. Thus, this paper
proposes an efficient match pair retrieval method and implements an integrated
workflow for parallel SfM reconstruction. First, an individual codebook is
trained online by considering the redundancy of UAV images and local features,
which avoids the ambiguity of training codebooks from other datasets. Second,
local features of each image are aggregated into a single high-dimension global
descriptor through the VLAD (Vector of Locally Aggregated Descriptors)
aggregation by using the trained codebook, which remarkably reduces the number
of features and the burden of nearest neighbor searching in image indexing.
Third, the global descriptors are indexed via the HNSW (Hierarchical Navigable
Small World) based graph structure for the nearest neighbor searching. Match
pairs are then retrieved by using an adaptive threshold selection strategy and
utilized to create a view graph for divide-and-conquer based parallel SfM
reconstruction. Finally, the performance of the proposed solution has been
verified using three large-scale UAV datasets. The test results demonstrate
that the proposed solution accelerates match pair retrieval with a speedup
ratio ranging from 36 to 108 and improves the efficiency of SfM reconstruction
with competitive accuracy in both relative and absolute orientation
- …