76 research outputs found

    Pairwise Independent Random Walks Can Be Slightly Unbounded

    Get PDF
    A family of problems that have been studied in the context of various streaming algorithms are generalizations of the fact that the expected maximum distance of a 4-wise independent random walk on a line over n steps is O(sqrt{n}). For small values of k, there exist k-wise independent random walks that can be stored in much less space than storing n random bits, so these properties are often useful for lowering space bounds. In this paper, we show that for all of these examples, 4-wise independence is required by demonstrating a pairwise independent random walk with steps uniform in +/- 1 and expected maximum distance Omega(sqrt{n} lg n) from the origin. We also show that this bound is tight for the first and second moment, i.e. the expected maximum square distance of a 2-wise independent random walk is always O(n lg^2 n). Also, for any even k >= 4, we show that the kth moment of the maximum distance of any k-wise independent random walk is O(n^{k/2}). The previous two results generalize to random walks tracking insertion-only streams, and provide higher moment bounds than currently known. We also prove a generalization of Kolmogorov\u27s maximal inequality by showing an asymptotically equivalent statement that requires only 4-wise independent random variables with bounded second moments, which also generalizes a result of Blasiok

    Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

    Full text link
    We provide improved lower bounds for two well-known high-dimensional private estimation tasks. First, we prove that for estimating the covariance of a Gaussian up to spectral error α\alpha with approximate differential privacy, one needs Ω~(d3/2αε+dα2)\tilde{\Omega}\left(\frac{d^{3/2}}{\alpha \varepsilon} + \frac{d}{\alpha^2}\right) samples for any αO(1)\alpha \le O(1), which is tight up to logarithmic factors. This improves over previous work which established this for αO(1d)\alpha \le O\left(\frac{1}{\sqrt{d}}\right), and is also simpler than previous work. Next, we prove that for estimating the mean of a heavy-tailed distribution with bounded kkth moments with approximate differential privacy, one needs Ω~(dαk/(k1)ε+dα2)\tilde{\Omega}\left(\frac{d}{\alpha^{k/(k-1)} \varepsilon} + \frac{d}{\alpha^2}\right) samples. This matches known upper bounds and improves over the best known lower bound for this problem, which only hold for pure differential privacy, or when k=2k = 2. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.Comment: 23 page

    Optimal Time-Backlog Tradeoffs for the Variable-Processor Cup Game

    Get PDF
    The \emph{p p-processor cup game} is a classic and widely studied scheduling problem that captures the setting in which a pp-processor machine must assign tasks to processors over time in order to ensure that no individual task ever falls too far behind. The problem is formalized as a multi-round game in which two players, a filler (who assigns work to tasks) and an emptier (who schedules tasks) compete. The emptier's goal is to minimize backlog, which is the maximum amount of outstanding work for any task. Recently, Kuszmaul and Westover (ITCS, 2021) proposed the \emph{variable-processor cup game}, which considers the same problem, except that the amount of resources available to the players (i.e., the number pp of processors) fluctuates between rounds of the game. They showed that this seemingly small modification fundamentally changes the dynamics of the game: whereas the optimal backlog in the fixed pp-processor game is Θ(logn)\Theta(\log n), independent of pp, the optimal backlog in the variable-processor game is Θ(n)\Theta(n). The latter result was only known to apply to games with \emph{exponentially many} rounds, however, and it has remained an open question what the optimal tradeoff between time and backlog is for shorter games. This paper establishes a tight trade-off curve between time and backlog in the variable-processor cup game. Importantly, we prove that for a game consisting of tt rounds, the optimal backlog is Θ(n)\Theta(n) if and only if tΩ(n3)t \ge \Omega(n^3). Our techniques also allow for us to resolve several other open questions concerning how the variable-processor cup game behaves in beyond-worst-case-analysis settings.Comment: 40 pages, published in International Conference on Automata, Languages, and Programming (ICALP), 2022. Abstract abridged for arXiv submission: see paper for full abstract. Updated to acknowledge additional fundin

    A faster and simpler algorithm for learning shallow networks

    Full text link
    We revisit the well-studied problem of learning a linear combination of kk ReLU activations given labeled examples drawn from the standard dd-dimensional Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for this problem to run in poly(d,1/ε)\text{poly}(d,1/\varepsilon) time when k=O(1)k = O(1), where ε\varepsilon is the target error. More precisely, their algorithm runs in time (d/ε)quasipoly(k)(d/\varepsilon)^{\mathrm{quasipoly}(k)} and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only (d/ε)O(k2)(d/\varepsilon)^{O(k^2)}.Comment: 14 page

    Improved Diversity Maximization Algorithms for Matching and Pseudoforest

    Full text link
    In this work we consider the diversity maximization problem, where given a data set XX of nn elements, and a parameter kk, the goal is to pick a subset of XX of size kk maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except ``remote-matching'', where only an O(logk)O(\log k) approximation was known. In this work we present an O(1)O(1) approximation for this remaining notion. Further, we consider these notions from the perpective of composable coresets. [IMMM14] provided composable coresets with a constant factor approximation for all but ``remote-pseudoforest'' and ``remote-matching'', which again they only obtained a O(logk)O(\log k) approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only O(k)O(k), and for remote-pseudoforest, our coreset has size O(k1+ε)O(k^{1+\varepsilon}) for any ε>0\varepsilon > 0, for an O(1/ε)O(1/\varepsilon)-approximate coreset.Comment: 27 pages, 1 table. Accepted to APPROX, 202

    Circular Trace Reconstruction

    Get PDF
    Trace reconstruction is the problem of learning an unknown string xx from independent traces of xx, where traces are generated by independently deleting each bit of xx with some deletion probability qq. In this paper, we initiate the study of Circular trace reconstruction, where the unknown string xx is circular and traces are now rotated by a random cyclic shift. Trace reconstruction is related to many computational biology problems studying DNA, which is a primary motivation for this problem as well, as many types of DNA are known to be circular. Our main results are as follows. First, we prove that we can reconstruct arbitrary circular strings of length nn using exp(O~(n1/3))\exp\big(\tilde{O}(n^{1/3})\big) traces for any constant deletion probability qq, as long as nn is prime or the product of two primes. For nn of this form, this nearly matches what was the best known bound of exp(O(n1/3))\exp\big(O(n^{1/3})\big) for standard trace reconstruction when this paper was initially released. We note, however, that Chase very recently improved the standard trace reconstruction bound to exp(O~(n1/5))\exp\big(\tilde{O}(n^{1/5})\big). Next, we prove that we can reconstruct random circular strings with high probability using nO(1)n^{O(1)} traces for any constant deletion probability qq. Finally, we prove a lower bound of Ω~(n3)\tilde{\Omega}(n^3) traces for arbitrary circular strings, which is greater than the best known lower bound of Ω~(n3/2)\tilde{\Omega}(n^{3/2}) in standard trace reconstruction.Comment: 25 pages, 1 figure. To appear in Innovations in Theoretical Computer Science (ITCS), 202
    corecore