30,501 research outputs found

    Testing non-uniform k-wise independent distributions over product spaces (extended abstract)

    Get PDF
    A distribution D over Σ1× ⋯ ×Σ n is called (non-uniform) k-wise independent if for any set of k indices {i 1, ..., i k } and for any z1zki1ik, PrXD[Xi1Xik=z1zk]=PrXD[Xi1=z1]PrXD[Xik=zk]. We study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the uniform case we show an upper bound on the distance between a distribution D from the set of k-wise independent distributions in terms of the sum of Fourier coefficients of D at vectors of weight at most k. Such a bound was previously known only for the binary field. For the non-uniform case, we give a new characterization of distributions being k-wise independent and further show that such a characterization is robust. These greatly generalize the results of Alon et al. [1] on uniform k-wise independence over the binary field to non-uniform k-wise independence over product spaces. Our results yield natural testing algorithms for k-wise independence with time and sample complexity sublinear in terms of the support size when k is a constant. The main technical tools employed include discrete Fourier transforms and the theory of linear systems of congruences.National Science Foundation (U.S.) (NSF grant 0514771)National Science Foundation (U.S.) (grant 0728645)National Science Foundation (U.S.) (Grant 0732334)Marie Curie International Reintegration Grants (Grant PIRG03-GA-2008-231077)Israel Science Foundation (Grant 1147/09)Israel Science Foundation (Grant 1675/09)Massachusetts Institute of Technology (Akamai Presidential Fellowship

    Testing k-wise independent distributions

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-123).A probability distribution over {0, 1}' is k-wise independent if its restriction to any k coordinates is uniform. More generally, a discrete distribution D over E1 x ... x E, is called (non-uniform) k-wise independent if for any subset of k indices {ii, . . . , ik} and for any zi E Ei 1, .. , Zk E Eik , PrX~D [Xi 1 - - -Xi, = Z1 .. z] = PrX-D[Xi 1 = zi] ... PrX~D [Xik = Zk]. k-wise independent distributions look random "locally" to an observer of only k coordinates, even though they may be far from random "globally". Because of this key feature, k-wise independent distributions are important concepts in probability, complexity, and algorithm design. In this thesis, we study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the problem of distinguishing k-wise independent distributions supported on the Boolean cube from those that are 6-far in statistical distance from any k-wise independent distribution, we upper bound the number of required samples by O(nk/6 2 ) and lower bound it by Q (n 2 /6) (these bounds hold for constant k, and essentially the same bounds hold for general k). To achieve these bounds, we use novel Fourier analysis techniques to relate a distribution's statistical distance from k-wise independence to its biases, a measure of the parity imbalance it induces on a set of variables. The relationships we derive are tighter than previously known, and may be of independent interest. We then generalize our results to distributions over larger domains. For the uniform case we show an upper bound on the distance between a distribution D from k-wise independent distributions in terms of the sum of Fourier coefficients of D at vectors of weight at most k. For the non-uniform case, we give a new characterization of distributions being k-wise independent and further show that such a characterization is robust based on our results for the uniform case. Our results yield natural testing algorithms for k-wise independence with time and sample complexity sublinear in terms of the support size of the distribution when k is a constant. The main technical tools employed include discrete Fourier transform and the theory of linear systems of congruences.by Ning Xie.Ph.D

    Deterministic parallel algorithms for bilinear objective functions

    Full text link
    Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showed that in many cases these techniques can be combined to give deterministic parallel (NC) algorithms for a variety of combinatorial optimization problems, with low time- and processor-complexity. We extend and generalize a technique of Luby for efficiently handling bilinear objective functions. One noteworthy application is an NC algorithm for maximal independent set. On a graph GG with mm edges and nn vertices, this takes O~(log2n)\tilde O(\log^2 n) time and (m+n)no(1)(m + n) n^{o(1)} processors, nearly matching the best randomized parallel algorithms. Other applications include reduced processor counts for algorithms of Berger (1997) for maximum acyclic subgraph and Gale-Berlekamp switching games. This bilinear factorization also gives better algorithms for problems involving discrepancy. An important application of this is to automata-fooling probability spaces, which are the basis of a notable derandomization technique of Sivakumar (2002). Our method leads to large reduction in processor complexity for a number of derandomization algorithms based on automata-fooling, including set discrepancy and the Johnson-Lindenstrauss Lemma

    On choosing and bounding probability metrics

    Get PDF
    When studying convergence of measures, an important issue is the choice of probability metric. In this review, we provide a summary and some new results concerning bounds among ten important probability metrics/distances that are used by statisticians and probabilists. We focus on these metrics because they are either well-known, commonly used, or admit practical bounding techniques. We summarize these relationships in a handy reference diagram, and also give examples to show how rates of convergence can depend on the metric chosen.Comment: To appear, International Statistical Review. Related work at http://www.math.hmc.edu/~su/papers.htm

    A directed isoperimetric inequality with application to Bregman near neighbor lower bounds

    Full text link
    Bregman divergences DϕD_\phi are a class of divergences parametrized by a convex function ϕ\phi and include well known distance functions like 22\ell_2^2 and the Kullback-Leibler divergence. There has been extensive research on algorithms for problems like clustering and near neighbor search with respect to Bregman divergences, in all cases, the algorithms depend not just on the data size nn and dimensionality dd, but also on a structure constant μ1\mu \ge 1 that depends solely on ϕ\phi and can grow without bound independently. In this paper, we provide the first evidence that this dependence on μ\mu might be intrinsic. We focus on the problem of approximate near neighbor search for Bregman divergences. We show that under the cell probe model, any non-adaptive data structure (like locality-sensitive hashing) for cc-approximate near-neighbor search that admits rr probes must use space Ω(n1+μcr)\Omega(n^{1 + \frac{\mu}{c r}}). In contrast, for LSH under 1\ell_1 the best bound is Ω(n1+1cr)\Omega(n^{1+\frac{1}{cr}}). Our new tool is a directed variant of the standard boolean noise operator. We show that a generalization of the Bonami-Beckner hypercontractivity inequality exists "in expectation" or upon restriction to certain subsets of the Hamming cube, and that this is sufficient to prove the desired isoperimetric inequality that we use in our data structure lower bound. We also present a structural result reducing the Hamming cube to a Bregman cube. This structure allows us to obtain lower bounds for problems under Bregman divergences from their 1\ell_1 analog. In particular, we get a (weaker) lower bound for approximate near neighbor search of the form Ω(n1+1cr)\Omega(n^{1 + \frac{1}{cr}}) for an rr-query non-adaptive data structure, and new cell probe lower bounds for a number of other near neighbor questions in Bregman space.Comment: 27 page
    corecore