782 research outputs found
Fast Approximate Nearest Neighbor Search with a Dynamic Exploration Graph using Continuous Refinement
For approximate nearest neighbor search, graph-based algorithms have shown to
offer the best trade-off between accuracy and search time. We propose the
Dynamic Exploration Graph (DEG) which significantly outperforms existing
algorithms in terms of search and exploration efficiency by combining two new
ideas: First, a single undirected even regular graph is incrementally built by
partially replacing existing edges to integrate new vertices and to update old
neighborhoods at the same time. Secondly, an edge optimization algorithm is
used to continuously improve the quality of the graph. Combining this ongoing
refinement with the graph construction process leads to a well-organized graph
structure at all times, resulting in: (1) increased search efficiency, (2)
predictable index size, (3) guaranteed connectivity and therefore reachability
of all vertices, and (4) a dynamic graph structure. In addition we investigate
how well existing graph-based search systems can handle indexed queries where
the seed vertex of a search is the query itself. Such exploration tasks,
despite their good starting point, are not necessarily easy. High efficiency in
approximate nearest neighbor search (ANNS) does not automatically imply good
performance in exploratory search. Extensive experiments show that our new
Dynamic Exploration Graph outperforms existing algorithms significantly for
indexed and unindexed queries
Efficient data structures for model-free data-driven computational mechanics
The data-driven computing paradigm initially introduced by Kirchdoerfer & Ortiz (2016) enables finite element computations in solid mechanics to be performed directly from material data sets, without an explicit material model. From a computational effort point of view, the most challenging task is the projection of admissible states at material points onto their closest states in the material data set. In this study, we compare and develop several possible data structures for solving the nearest-neighbor problem. We show that approximate nearest-neighbor (ANN) algorithms can accelerate material data searches by several orders of magnitude relative to exact searching algorithms. The approximations are suggested by—and adapted to—the structure of the data-driven iterative solver and result in no significant loss of solution accuracy. We assess the performance of the ANN algorithm with respect to material data set size with the aid of a 3D elasticity test case. We show that computations on a single processor with up to one billion material data points are feasible within a few seconds execution time with a speed up of more than 10⁶ with respect to exact k-d trees
Survey of Vector Database Management Systems
There are now over 20 commercial vector database management systems (VDBMSs),
all produced within the past five years. But embedding-based retrieval has been
studied for over ten years, and similarity search a staggering half century and
more. Driving this shift from algorithms to systems are new data intensive
applications, notably large language models, that demand vast stores of
unstructured data coupled with reliable, secure, fast, and scalable query
processing capability. A variety of new data management techniques now exist
for addressing these needs, however there is no comprehensive survey to
thoroughly review these techniques and systems. We start by identifying five
main obstacles to vector data management, namely vagueness of semantic
similarity, large size of vectors, high cost of similarity comparison, lack of
natural partitioning that can be used for indexing, and difficulty of
efficiently answering hybrid queries that require both attributes and vectors.
Overcoming these obstacles has led to new approaches to query processing,
storage and indexing, and query optimization and execution. For query
processing, a variety of similarity scores and query types are now well
understood; for storage and indexing, techniques include vector compression,
namely quantization, and partitioning based on randomization, learning
partitioning, and navigable partitioning; for query optimization and execution,
we describe new operators for hybrid queries, as well as techniques for plan
enumeration, plan selection, and hardware accelerated execution. These
techniques lead to a variety of VDBMSs across a spectrum of design and runtime
characteristics, including native systems specialized for vectors and extended
systems that incorporate vector capabilities into existing systems. We then
discuss benchmarks, and finally we outline research challenges and point the
direction for future work.Comment: 25 page
- …