Search CORE

10 research outputs found

Quantum Communication Complexity of Distributed Set Joins

Author: Jeffery Stacey
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 41st International Symposium on Mathematical Foundations of Computer Science (MFCS 2016)
Publication date: 01/01/2016
Field of study

Computing set joins of two inputs is a common task in database theory. Recently, Van Gucht, Williams, Woodruff and Zhang [PODS 2015] considered the complexity of such problems in the natural model of (classical) two-party communication complexity and obtained tight bounds for the complexity of several important distributed set joins. In this paper we initiate the study of the quantum communication complexity of distributed set joins. We design a quantum protocol for distributed Boolean matrix multiplication, which corresponds to computing the composition join of two databases, showing that the product of two n times n Boolean matrices, each owned by one of two respective parties, can be computed with widetilde-O(sqrt{n} ell^{3/4}) qubits of communication, where ell denotes the number of non-zero entries of the product. Since Van Gucht et al. showed that the classical communication complexity of this problem is widetilde-Theta(n sqrt{ell}), our quantum algorithm outperforms classical protocols whenever the output matrix is sparse. We also show a quantum lower bound and a matching classical upper bound on the communication complexity of distributed matrix multiplication over F_2. Besides their applications to database theory, the communication complexity of set joins is interesting due to its connections to direct product theorems in communication complexity. In this work we also introduce a notion of all-pairs product theorem, and relate this notion to standard direct product theorems in communication complexity

Dagstuhl Research Online Publication Server

Selectivity estimation on set containment search

Author: K Tzoumas
R Baeza-Yates
R Jampani
S Helmer
S Melnik
S Suri
X Wang
Z Bar-Yossef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

© Springer Nature Switzerland AG 2019. In this paper, we study the problem of selectivity estimation on set containment search. Given a query record Q and a record dataset S, we aim to accurately and efficiently estimate the selectivity of set containment search of query Q over S. The problem has many important applications in commercial fields and scientific studies. To the best of our knowledge, this is the first work to study this important problem. We first extend existing distinct value estimating techniques to solve this problem and develop an inverted list and G-KMV sketch based approach IL-GKMV. We analyse that the performance of IL-GKMV degrades with the increase of vocabulary size. Motivated by limitations of existing techniques and the inherent challenges of the problem, we resort to developing effective and efficient sampling approaches and propose an ordered trie structure based sampling approach named OT-Sampling. OT-Sampling partitions records based on element frequency and occurrence patterns and is significantly more accurate compared with simple random sampling method and IL-GKMV. To further enhance performance, a divide-and-conquer based sampling approach, DC-Sampling, is presented with an inclusion/exclusion prefix to explore the pruning opportunities. We theoretically analyse the proposed techniques regarding various accuracy estimators. Our comprehensive experiments on 6 real datasets verify the effectiveness and efficiency of our proposed techniques

Crossref

OPUS - University of Technology Sydney

The PCP-like Theorem for Sub-linear Time Inapproximability

Author: Li Jianzhong
Ma Hengzhao
Publication venue
Publication date: 01/01/2024
Field of study

In this paper we propose the PCP-like theorem for sub-linear time inapproximability. Abboud et al. have devised the distributed PCP framework for proving sub-quadratic time inapproximability. Here we try to go further in this direction. Staring from SETH, we first find a problem denoted as Ext-

k

-SAT, which can not be computed in linear time, then devise an efficient MA-like protocol for this problem. To use this protocol to prove the sub-linear time inapproximability of other problems, we devise a new kind of reduction denoted as Ext-reduction, and it is different from existing reduction techniques. We also define two new hardness class, the problems in which can be computed in linear-time, but can not be efficiently approximated in sub-linear time. Some problems are shown to be in the newly defined hardness class.Comment: arXiv admin note: substantial text overlap with arXiv:2011.0232

arXiv.org e-Print Archive

Laws for rewriting queries containing division operators

Author: Mangold Christoph
Rantzau Ralf
Publication venue
Publication date: 08/07/2013
Field of study

Relational division, also known as small divide, is a derived operator of the relational algebra that realizes a many-to-one set containment test, where a set is represented as a group of tuples: Small divide discovers which sets in a dividend relation contain all elements of the set stored in a divisor relation. The great divide operator extends small divide by realizing many-to-many set containment tests. It is also similar to the set containment join operator for schemas that are not in first normal form. Neither small nor great divide has been implemented in commercial relational database systems although the operators solve important problems and many efficient algorithms for them exist. We present algebraic laws that allow rewriting expressions containing small or great divide, illustrate their importance for query optimization, and discuss the use of great divide for frequent itemset discovery, an important data mining primitive. A recent theoretic result shows that small divide must be implemented by special purpose algorithms and not be simulated by pure relational algebra expressions to achieve efficiency. Consequently, an efficient implementation requires that the optimizer treats small divide as a first-class operator and possesses powerful algebraic laws for query rewriting

An Algebraic Approach to XQuery Optimization

Author: May Norman
Publication venue: Universität Mannheim
Publication date: 01/01/2007
Field of study

As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered

MAnnheim DOCument Server

Adaptive algorithms for set containment joins

Author: Hector Garcia-Molina
Helmer S.
Hirai J.
Melnik S.
Ramasamy K.
Sergey Melnik
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref