13,112 research outputs found

    Optimal Joins Using Compact Data Structures

    Get PDF
    Worst-case optimal join algorithms have gained a lot of attention in the database literature. We now count with several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality we either need to build completely new indexes, or we must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that may be non-negligible. We show that optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of extra storage. Our representation is a compact quadtree for the static indexes, and a dynamic quadtree sharing subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, and show that the running time of this algorithm is worst-case optimal in data complexity. Remarkably, we can extend our framework to evaluate more expressive queries from relational algebra by introducing a lazy version of qdags (lqdags). Once again, we can show that the running time of our algorithms is worst-case optimal

    Labeling Schemes with Queries

    Full text link
    We study the question of ``how robust are the known lower bounds of labeling schemes when one increases the number of consulted labels''. Let ff be a function on pairs of vertices. An ff-labeling scheme for a family of graphs \cF labels the vertices of all graphs in \cF such that for every graph G\in\cF and every two vertices u,vGu,v\in G, the value f(u,v)f(u,v) can be inferred by merely inspecting the labels of uu and vv. This paper introduces a natural generalization: the notion of ff-labeling schemes with queries, in which the value f(u,v)f(u,v) can be inferred by inspecting not only the labels of uu and vv but possibly the labels of some additional vertices. We show that inspecting the label of a single additional vertex (one {\em query}) enables us to reduce the label size of many labeling schemes significantly

    Old Techniques for New Join Algorithms: A Case Study in RDF Processing

    Full text link
    Recently there has been significant interest around designing specialized RDF engines, as traditional query processing mechanisms incur orders of magnitude performance gaps on many RDF workloads. At the same time researchers have released new worst-case optimal join algorithms which can be asymptotically better than the join algorithms in traditional engines. In this paper we apply worst-case optimal join algorithms to a standard RDF workload, the LUBM benchmark, for the first time. We do so using two worst-case optimal engines: (1) LogicBlox, a commercial database engine, and (2) EmptyHeaded, our prototype research engine with enhanced worst-case optimal join algorithms. We show that without any added optimizations both LogicBlox and EmptyHeaded outperform two state-of-the-art specialized RDF engines, RDF-3X and TripleBit, by up to 6x on cyclic join queries-the queries where traditional optimizers are suboptimal. On the remaining, less complex queries in the LUBM benchmark, we show that three classic query optimization techniques enable EmptyHeaded to compete with RDF engines, even when there is no asymptotic advantage to the worst-case optimal approach. We validate that our design has merit as EmptyHeaded outperforms MonetDB by three orders of magnitude and LogicBlox by two orders of magnitude, while remaining within an order of magnitude of RDF-3X and TripleBit

    Deductive Optimization of Relational Data Storage

    Full text link
    Optimizing the physical data storage and retrieval of data are two key database management problems. In this paper, we propose a language that can express a wide range of physical database layouts, going well beyond the row- and column-based methods that are widely used in database management systems. We use deductive synthesis to turn a high-level relational representation of a database query into a highly optimized low-level implementation which operates on a specialized layout of the dataset. We build a compiler for this language and conduct experiments using a popular database benchmark, which shows that the performance of these specialized queries is competitive with a state-of-the-art in memory compiled database system

    On Efficient Distributed Construction of Near Optimal Routing Schemes

    Full text link
    Given a distributed network represented by a weighted undirected graph G=(V,E)G=(V,E) on nn vertices, and a parameter kk, we devise a distributed algorithm that computes a routing scheme in (n1/2+1/k+D)no(1)(n^{1/2+1/k}+D)\cdot n^{o(1)} rounds, where DD is the hop-diameter of the network. The running time matches the lower bound of Ω~(n1/2+D)\tilde{\Omega}(n^{1/2}+D) rounds (which holds for any scheme with polynomial stretch), up to lower order terms. The routing tables are of size O~(n1/k)\tilde{O}(n^{1/k}), the labels are of size O(klog2n)O(k\log^2n), and every packet is routed on a path suffering stretch at most 4k5+o(1)4k-5+o(1). Our construction nearly matches the state-of-the-art for routing schemes built in a centralized sequential manner. The previous best algorithms for building routing tables in a distributed small messages model were by \cite[STOC 2013]{LP13} and \cite[PODC 2015]{LP15}. The former has similar properties but suffers from substantially larger routing tables of size O(n1/2+1/k)O(n^{1/2+1/k}), while the latter has sub-optimal running time of O~(min{(nD)1/2n1/k,n2/3+2/(3k)+D})\tilde{O}(\min\{(nD)^{1/2}\cdot n^{1/k},n^{2/3+2/(3k)}+D\})

    Physics-inspired Performace Evaluation of a Structured Peer-to-Peer Overlay Network

    Get PDF
    In the majority of structured peer-to-peer overlay networks a graph with a desirable topology is constructed. In most cases, the graph is maintained by a periodic activity performed by each node in the graph to preserve the desirable structure in face of the continuous change of the set of nodes. The interaction of the autonomous periodic activities of the nodes renders the performance analysis of such systems complex and simulation of scales of interest can be prohibitive. Physicists, however, are accustomed to dealing with scale by characterizing a system using intensive variables, i.e. variables that are size independent. The approach has proved its usefulness when applied to satisfiability theory. This work is the first attempt to apply it in the area of distributed systems. The contribution of this paper is two-fold. First, we describe a methodology to be used for analyzing the performance of large scale distributed systems. Second, we show how we applied the methodology to find an intensive variable that describe the characteristic behavior of the Chord overlay network, namely, the ratio of the magnitude of perturbation of the network (joins/failures) to the magnitude of periodic stabilization of the network
    corecore