4 research outputs found
Low Discrepancy Sets Yield Approximate Min-Wise Independent Permutation Families
Motivated by a problem of filtering near-duplicate Web documents, Broder, Charikar, Frieze & Mitzenmacher defined the following notion of ffl-approximate min-wise independent permutation families. A multiset F of permutations of f0; 1; : : : ; n \Gamma 1g is such a family if for all K ` f0; 1; : : : ; n \Gamma 1g and any x 2 K, a permutation chosen uniformly at random from F satisfies j Pr[minf(K)g = (x)] \Gamma 1 jKj j ffl jKj : We show connections of such families with low discrepancy sets for geometric rectangles, and give explicit constructions of such families F of size n O( p log n) for ffl = 1=n \Theta(1) , improving upon the previously best-known bound of Indyk. We also present polynomialsize constructions when the min-wise condition is required only for jKj 2 O(log 2=3 n) , with ffl 2 \GammaO(log 2=3 n) . Keywords: Combinatorial problems; min-wise independent permutations; information retrieval; document filtering; pseudorandom permutations; explicit constructions
Set-Codes with Small Intersections and Small Discrepancies
We are concerned with the problem of designing large families of subsets over
a common labeled ground set that have small pairwise intersections and the
property that the maximum discrepancy of the label values within each of the
sets is less than or equal to one. Our results, based on transversal designs,
factorizations of packings and Latin rectangles, show that by jointly
constructing the sets and labeling scheme, one can achieve optimal family sizes
for many parameter choices. Probabilistic arguments akin to those used for
pseudorandom generators lead to significantly suboptimal results when compared
to the proposed combinatorial methods. The design problem considered is
motivated by applications in molecular data storage and theoretical computer
science