28,129 research outputs found
HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces
Nearest neighbor searching of large databases in high-dimensional spaces is
inherently difficult due to the curse of dimensionality. A flavor of
approximation is, therefore, necessary to practically solve the problem of
nearest neighbor search. In this paper, we propose a novel yet simple indexing
scheme, HD-Index, to solve the problem of approximate k-nearest neighbor
queries in massive high-dimensional databases. HD-Index consists of a set of
novel hierarchical structures called RDB-trees built on Hilbert keys of
database objects. The leaves of the RDB-trees store distances of database
objects to reference objects, thereby allowing efficient pruning using distance
filters. In addition to triangular inequality, we also use Ptolemaic inequality
to produce better lower bounds. Experiments on massive (up to billion scale)
high-dimensional (up to 1000+) datasets show that HD-Index is effective,
efficient, and scalable.Comment: PVLDB 11(8):906-919, 201
An Inflationary Fixed Point Operator in XQuery
We introduce a controlled form of recursion in XQuery, inflationary fixed
points, familiar in the context of relational databases. This imposes
restrictions on the expressible types of recursion, but we show that
inflationary fixed points nevertheless are sufficiently versatile to capture a
wide range of interesting use cases, including the semantics of Regular XPath
and its core transitive closure construct.
While the optimization of general user-defined recursive functions in XQuery
appears elusive, we will describe how inflationary fixed points can be
efficiently evaluated, provided that the recursive XQuery expressions exhibit a
distributivity property. We show how distributivity can be assessed both,
syntactically and algebraically, and provide experimental evidence that XQuery
processors can substantially benefit during inflationary fixed point
evaluation.Comment: 11 pages, 10 figures, 2 table
Hashing as Tie-Aware Learning to Rank
Hashing, or learning binary embeddings of data, is frequently used in nearest
neighbor retrieval. In this paper, we develop learning to rank formulations for
hashing, aimed at directly optimizing ranking-based evaluation metrics such as
Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We
first observe that the integer-valued Hamming distance often leads to tied
rankings, and propose to use tie-aware versions of AP and NDCG to evaluate
hashing for retrieval. Then, to optimize tie-aware ranking metrics, we derive
their continuous relaxations, and perform gradient-based optimization with deep
neural networks. Our results establish the new state-of-the-art for image
retrieval by Hamming ranking in common benchmarks.Comment: 15 pages, 3 figures. IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 201
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
- …