13,112 research outputs found
Optimal Joins Using Compact Data Structures
Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible.
We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal
Labeling Schemes with Queries
We study the question of ``how robust are the known lower bounds of labeling
schemes when one increases the number of consulted labels''. Let be a
function on pairs of vertices. An -labeling scheme for a family of graphs
\cF labels the vertices of all graphs in \cF such that for every graph
G\in\cF and every two vertices , the value can be inferred
by merely inspecting the labels of and .
This paper introduces a natural generalization: the notion of -labeling
schemes with queries, in which the value can be inferred by inspecting
not only the labels of and but possibly the labels of some additional
vertices. We show that inspecting the label of a single additional vertex (one
{\em query}) enables us to reduce the label size of many labeling schemes
significantly
Old Techniques for New Join Algorithms: A Case Study in RDF Processing
Recently there has been significant interest around designing specialized RDF
engines, as traditional query processing mechanisms incur orders of magnitude
performance gaps on many RDF workloads. At the same time researchers have
released new worst-case optimal join algorithms which can be asymptotically
better than the join algorithms in traditional engines. In this paper we apply
worst-case optimal join algorithms to a standard RDF workload, the LUBM
benchmark, for the first time. We do so using two worst-case optimal engines:
(1) LogicBlox, a commercial database engine, and (2) EmptyHeaded, our prototype
research engine with enhanced worst-case optimal join algorithms. We show that
without any added optimizations both LogicBlox and EmptyHeaded outperform two
state-of-the-art specialized RDF engines, RDF-3X and TripleBit, by up to 6x on
cyclic join queries-the queries where traditional optimizers are suboptimal. On
the remaining, less complex queries in the LUBM benchmark, we show that three
classic query optimization techniques enable EmptyHeaded to compete with RDF
engines, even when there is no asymptotic advantage to the worst-case optimal
approach. We validate that our design has merit as EmptyHeaded outperforms
MonetDB by three orders of magnitude and LogicBlox by two orders of magnitude,
while remaining within an order of magnitude of RDF-3X and TripleBit
Deductive Optimization of Relational Data Storage
Optimizing the physical data storage and retrieval of data are two key
database management problems. In this paper, we propose a language that can
express a wide range of physical database layouts, going well beyond the row-
and column-based methods that are widely used in database management systems.
We use deductive synthesis to turn a high-level relational representation of a
database query into a highly optimized low-level implementation which operates
on a specialized layout of the dataset. We build a compiler for this language
and conduct experiments using a popular database benchmark, which shows that
the performance of these specialized queries is competitive with a
state-of-the-art in memory compiled database system
On Efficient Distributed Construction of Near Optimal Routing Schemes
Given a distributed network represented by a weighted undirected graph
on vertices, and a parameter , we devise a distributed
algorithm that computes a routing scheme in
rounds, where is the hop-diameter of the network. The running time matches
the lower bound of rounds (which holds for any
scheme with polynomial stretch), up to lower order terms. The routing tables
are of size , the labels are of size , and
every packet is routed on a path suffering stretch at most . Our
construction nearly matches the state-of-the-art for routing schemes built in a
centralized sequential manner. The previous best algorithms for building
routing tables in a distributed small messages model were by \cite[STOC
2013]{LP13} and \cite[PODC 2015]{LP15}. The former has similar properties but
suffers from substantially larger routing tables of size ,
while the latter has sub-optimal running time of
Physics-inspired Performace Evaluation of a Structured Peer-to-Peer Overlay Network
In the majority of structured peer-to-peer overlay networks a graph
with a desirable topology is constructed. In most cases, the graph is
maintained by a periodic activity performed by each node in the graph
to preserve the desirable structure in face of the continuous change
of the set of nodes. The interaction of the autonomous periodic
activities of the nodes renders the performance analysis of such
systems complex and simulation of scales of interest can be
prohibitive. Physicists, however, are accustomed to dealing with
scale by characterizing a system using intensive variables,
i.e. variables that are size independent. The approach has proved its
usefulness when applied to satisfiability theory. This
work is the first attempt to apply it in the area of distributed
systems. The contribution of this paper is two-fold. First, we
describe a methodology to be used for analyzing the performance of
large scale distributed systems. Second, we show how we applied the
methodology to find an intensive variable that describe the
characteristic behavior of the Chord overlay network, namely, the
ratio of the magnitude of perturbation of the network (joins/failures)
to the magnitude of periodic stabilization of the network
- …