A random hash function h is ε-minwise if for any set S,
∣S∣=n, and element x∈S, Pr[h(x)=minh(S)]=(1±ε)/n.
Minwise hash functions with low bias ε have widespread applications
within similarity estimation.
Hashing from a universe [u], the twisted tabulation hashing of
P\v{a}tra\c{s}cu and Thorup [SODA'13] makes c=O(1) lookups in tables of size
u1/c. Twisted tabulation was invented to get good concentration for
hashing based sampling. Here we show that twisted tabulation yields O~(1/u1/c)-minwise hashing.
In the classic independence paradigm of Wegman and Carter [FOCS'79] O~(1/u1/c)-minwise hashing requires Ω(logu)-independence [Indyk
SODA'99]. P\v{a}tra\c{s}cu and Thorup [STOC'11] had shown that simple
tabulation, using same space and lookups yields O~(1/n1/c)-minwise
independence, which is good for large sets, but useless for small sets. Our
analysis uses some of the same methods, but is much cleaner bypassing a
complicated induction argument.Comment: To appear in Proceedings of SWAT 201