4 research outputs found

    Low Discrepancy Sets Yield Approximate Min-Wise Independent Permutation Families

    No full text
    Motivated by a problem of filtering near-duplicate Web documents, Broder, Charikar, Frieze & Mitzenmacher defined the following notion of ffl-approximate min-wise independent permutation families. A multiset F of permutations of f0; 1; : : : ; n \Gamma 1g is such a family if for all K ` f0; 1; : : : ; n \Gamma 1g and any x 2 K, a permutation chosen uniformly at random from F satisfies j Pr[minf(K)g = (x)] \Gamma 1 jKj j ffl jKj : We show connections of such families with low discrepancy sets for geometric rectangles, and give explicit constructions of such families F of size n O( p log n) for ffl = 1=n \Theta(1) , improving upon the previously best-known bound of Indyk. We also present polynomialsize constructions when the min-wise condition is required only for jKj 2 O(log 2=3 n) , with ffl 2 \GammaO(log 2=3 n) . Keywords: Combinatorial problems; min-wise independent permutations; information retrieval; document filtering; pseudorandom permutations; explicit constructions

    Set-Codes with Small Intersections and Small Discrepancies

    Full text link
    We are concerned with the problem of designing large families of subsets over a common labeled ground set that have small pairwise intersections and the property that the maximum discrepancy of the label values within each of the sets is less than or equal to one. Our results, based on transversal designs, factorizations of packings and Latin rectangles, show that by jointly constructing the sets and labeling scheme, one can achieve optimal family sizes for many parameter choices. Probabilistic arguments akin to those used for pseudorandom generators lead to significantly suboptimal results when compared to the proposed combinatorial methods. The design problem considered is motivated by applications in molecular data storage and theoretical computer science
    corecore