148 research outputs found

    Near-Quadratic Lower Bounds for Two-Pass Graph Streaming Algorithms

    Full text link
    We prove that any two-pass graph streaming algorithm for the ss-tt reachability problem in nn-vertex directed graphs requires near-quadratic space of n2o(1)n^{2-o(1)} bits. As a corollary, we also obtain near-quadratic space lower bounds for several other fundamental problems including maximum bipartite matching and (approximate) shortest path in undirected graphs. Our results collectively imply that a wide range of graph problems admit essentially no non-trivial streaming algorithm even when two passes over the input is allowed. Prior to our work, such impossibility results were only known for single-pass streaming algorithms, and the best two-pass lower bounds only ruled out o(n7/6)o(n^{7/6}) space algorithms, leaving open a large gap between (trivial) upper bounds and lower bounds

    Finding heavy hitters from lossy or noisy data

    Get PDF
    Abstract. Motivated by Dvir et al. and Wigderson and Yehudayoff [3

    Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings

    Get PDF
    We consider the following oblivious sketching problem: given epsilon in (0,1/3) and n >= d/epsilon^2, design a distribution D over R^{k * nd} and a function f: R^k * R^{nd} -> R}, so that for any n * d matrix A, Pr_{S sim D} [(1-epsilon) |A|_{op} = 2/3, where |A|_{op} = sup_{x:|x|_2 = 1} |Ax|_2 is the operator norm of A and S(A) denotes S * A, interpreting A as a vector in R^{nd}. We show a tight lower bound of k = Omega(d^2/epsilon^2) for this problem. Previously, Nelson and Nguyen (ICALP, 2014) considered the problem of finding a distribution D over R^{k * n} such that for any n * d matrix A, Pr_{S sim D}[forall x, (1-epsilon)|Ax|_2 = 2/3, which is called an oblivious subspace embedding (OSE). Our result considerably strengthens theirs, as it (1) applies only to estimating the operator norm, which can be estimated given any OSE, and (2) applies to distributions over general linear operators S which treat A as a vector and compute S(A), rather than the restricted class of linear operators corresponding to matrix multiplication. Our technique also implies the first tight bounds for approximating the Schatten p-norm for even integers p via general linear sketches, improving the previous lower bound from k = Omega(n^{2-6/p}) [Regev, 2014] to k = Omega(n^{2-4/p}). Importantly, for sketching the operator norm up to a factor of alpha, where alpha - 1 = Omega(1), we obtain a tight k = Omega(n^2/alpha^4) bound, matching the upper bound of Andoni and Nguyen (SODA, 2013), and improving the previous k = Omega(n^2/alpha^6) lower bound. Finally, we also obtain the first lower bounds for approximating Ky Fan norms

    Asymptotically Optimal Bounds for Estimating H-Index in Sublinear Time with Applications to Subgraph Counting

    Get PDF
    The hh-index is a metric used to measure the impact of a user in a publication setting, such as a member of a social network with many highly liked posts or a researcher in an academic domain with many highly cited publications. Specifically, the hh-index of a user is the largest integer hh such that at least hh publications of the user have at least hh units of positive feedback. We design an algorithm that, given query access to the nn publications of a user and each publication's corresponding positive feedback number, outputs a (1±ε)(1\pm \varepsilon)-approximation of the hh-index of this user with probability at least 1δ1-\delta in time O(nln(1/δ)ε2h), O(\frac{n \cdot \ln{(1/\delta)}}{\varepsilon^2 \cdot h}), where hh is the actual hh-index which is unknown to the algorithm a-priori. We then design a novel lower bound technique that allows us to prove that this bound is in fact asymptotically optimal for this problem in all parameters n,h,ε,n,h,\varepsilon, and δ\delta. Our work is one of the first in sublinear time algorithms that addresses obtaining asymptotically optimal bounds, especially in terms of the error and confidence parameters. As such, we focus on designing novel techniques for this task. In particular, our lower bound technique seems quite general -- to showcase this, we also use our approach to prove an asymptotically optimal lower bound for the problem of estimating the number of triangles in a graph in sublinear time, which now is also optimal in the error and confidence parameters. This result improves upon prior lower bounds of Eden, Levi, Ron, and Seshadhri (FOCS'15) for this problem, as well as multiple follow-ups that extended this lower bound to other subgraph counting problems.Comment: Full version of the paper accepted to APPROX 202

    Approximating CSPs with Outliers

    Get PDF

    Asymptotically Optimal Bounds for Estimating H-Index in Sublinear Time with Applications to Subgraph Counting

    Full text link
    The hh-index is a metric used to measure the impact of a user in a publication setting, such as a member of a social network with many highly liked posts or a researcher in an academic domain with many highly cited publications. Specifically, the hh-index of a user is the largest integer hh such that at least hh publications of the user have at least hh units of positive feedback. We design an algorithm that, given query access to the nn publications of a user and each publication's corresponding positive feedback number, outputs a (1±ε)(1\pm \varepsilon)-approximation of the hh-index of this user with probability at least 1δ1-\delta in time O(nln(1/δ)ε2h), O(\frac{n \cdot \ln{(1/\delta)}}{\varepsilon^2 \cdot h}), where hh is the actual hh-index which is unknown to the algorithm a-priori. We then design a novel lower bound technique that allows us to prove that this bound is in fact asymptotically optimal for this problem in all parameters n,h,ε,n,h,\varepsilon, and δ\delta. Our work is one of the first in sublinear time algorithms that addresses obtaining asymptotically optimal bounds, especially in terms of the error and confidence parameters. As such, we focus on designing novel techniques for this task. In particular, our lower bound technique seems quite general -- to showcase this, we also use our approach to prove an asymptotically optimal lower bound for the problem of estimating the number of triangles in a graph in sublinear time, which now is also optimal in the error and confidence parameters. This result improves upon prior lower bounds of Eden, Levi, Ron, and Seshadhri (FOCS'15) for this problem, as well as multiple follow-ups that extended this lower bound to other subgraph counting problems.Comment: Full version of the paper accepted to APPROX 202

    Tight Bounds for Graph Problems in Insertion Streams

    Get PDF
    Despite the large amount of work on solving graph problems in the data stream model, there do not exist tight space bounds for almost any of them, even in a stream with only edge insertions. For example, for testing connectivity, the upper bound is O(n * log(n)) bits, while the lower bound is only Omega(n) bits. We remedy this situation by providing the first tight Omega(n * log(n)) space lower bounds for randomized algorithms which succeed with constant probability in a stream of edge insertions for a number of graph problems. Our lower bounds apply to testing bipartiteness, connectivity, cycle-freeness, whether a graph is Eulerian, planarity, H-minor freeness, finding a minimum spanning tree of a connected graph, and testing if the diameter of a sparse graph is constant. We also give the first Omega(n * k * log(n)) space lower bounds for deterministic algorithms for k-edge connectivity and k-vertex connectivity; these are optimal in light of known deterministic upper bounds (for k-vertex connectivity we also need to allow edge duplications, which known upper bounds allow). Finally, we give an Omega(n * log^2(n)) lower bound for randomized algorithms approximating the minimum cut up to a constant factor with constant probability in a graph with integer weights between 1 and n, presented as a stream of insertions and deletions to its edges. This lower bound also holds for cut sparsifiers, and gives the first separation of maintaining a sparsifier in the data stream model versus the offline model
    corecore