29 research outputs found
Near-Quadratic Lower Bounds for Two-Pass Graph Streaming Algorithms
We prove that any two-pass graph streaming algorithm for the -
reachability problem in -vertex directed graphs requires near-quadratic
space of bits. As a corollary, we also obtain near-quadratic space
lower bounds for several other fundamental problems including maximum bipartite
matching and (approximate) shortest path in undirected graphs.
Our results collectively imply that a wide range of graph problems admit
essentially no non-trivial streaming algorithm even when two passes over the
input is allowed. Prior to our work, such impossibility results were only known
for single-pass streaming algorithms, and the best two-pass lower bounds only
ruled out space algorithms, leaving open a large gap between
(trivial) upper bounds and lower bounds
The Weisfeiler-Leman dimension of conjunctive queries
A graph parameter is a function on graphs with the property that, for any pair of isomorphic graphs 1
and 2, (1) = (2). The Weisfeiler–Leman (WL) dimension of is the minimum such that, if 1 and 2
are indistinguishable by the -dimensional WL-algorithm then (1) = (2). The WL-dimension of is ∞
if no such exists. We study the WL-dimension of graph parameters characterised by the number of answers
from a fixed conjunctive query to the graph. Given a conjunctive query , we quantify the WL-dimension of
the function that maps every graph to the number of answers of in .
The works of Dvorák (J. Graph Theory 2010), Dell, Grohe, and Rattan (ICALP 2018), and Neuen (ArXiv 2023)
have answered this question for full conjunctive queries, which are conjunctive queries without existentially
quantified variables. For such queries , the WL-dimension is equal to the treewidth of the Gaifman graph
of .
In this work, we give a characterisation that applies to all conjunctive queries. Given any conjunctive
query , we prove that its WL-dimension is equal to the semantic extension width sew(), a novel width
measure that can be thought of as a combination of the treewidth of and its quantified star size, an invariant
introduced by Durand and Mengel (ICDT 2013) describing how the existentially quantified variables of are
connected with the free variables. Using the recently established equivalence between the WL-algorithm and
higher-order Graph Neural Networks (GNNs) due to Morris et al. (AAAI 2019), we obtain as a consequence
that the function counting answers to a conjunctive query cannot be computed by GNNs of order smaller
than sew().
The majority of the paper is concerned with establishing a lower bound of the WL-dimension of a query.
Given any conjunctive query with semantic extension width , we consider a graph of treewidth
obtained from the Gaifman graph of by repeatedly cloning the vertices corresponding to existentially
quantified variables. Using a modification due to Fürer (ICALP 2001) of the Cai-Fürer-Immerman construction
(Combinatorica 1992), we then obtain a pair of graphs ( ) and ˆ( ) that are indistinguishable by the ( − 1)-
dimensional WL-algorithm since has treewidth . Finally, in the technical heart of the paper, we show
that has a different number of answers in ( ) and ˆ( ). Thus, can distinguish two graphs that cannot be
distinguished by the ( − 1)-dimensional WL-algorithm, so the WL-dimension of is at least
Degree Sequence Bound for Join Cardinality Estimation
Recent work has demonstrated the catastrophic effects of poor cardinality
estimates on query processing time. In particular, underestimating query
cardinality can result in overly optimistic query plans which take orders of
magnitude longer to complete than one generated with the true cardinality.
Cardinality bounding avoids this pitfall by computing a strict upper bound on
the query's output size using statistics about the database such as table sizes
and degrees, i.e. value frequencies. In this paper, we extend this line of work
by proving a novel bound called the Degree Sequence Bound which takes into
account the full degree sequences and the max tuple multiplicity. This bound
improves upon previous work incorporating degree constraints which focused on
the maximum degree rather than the degree sequence. Further, we describe how to
practically compute this bound using a learned approximation of the true degree
sequences
Answering UCQs under Updates and in the Presence of Integrity Constraints
We investigate the query evaluation problem for fixed queries over
fully dynamic databases where tuples can be inserted or deleted.
The task is to design a dynamic data structure that can immediately
report the new result of a fixed query after every database update.
We consider unions of conjunctive queries (UCQs) and focus on the query evaluation tasks testing (decide whether an input tuple belongs to the query result), enumeration (enumerate, without repetition,
all tuples in the query result), and counting (output the number of tuples in the query result).
We identify three increasingly restrictive classes of UCQs which we
call t-hierarchical, q-hierarchical, and exhaustively q-hierarchical UCQs.
Our main results provide the following dichotomies:
If the query\u27s homomorphic core is t-hierarchical (q-hierarchical,
exhaustively q-hierarchical), then the testing (enumeration, counting)
problem can be solved with constant update time and constant testing time (delay, counting time). Otherwise, it cannot be solved with sublinear update time and sublinear testing time (delay, counting time), unless the OV-conjecture and/or the OMv-conjecture fails.
We also study the complexity of query evaluation in the dynamic setting in the presence of integrity constraints, and we obtain similar dichotomy results for the special case of small domain constraints (i.e., constraints which state that
all values in a particular column of a relation belong to a fixed domain of constant size)