This paper presents a new protocol for solving the private heavy-hitters
problem. In this problem, there are many clients and a small set of
data-collection servers. Each client holds a private bitstring. The servers
want to recover the set of all popular strings, without learning anything else
about any client's string. A web-browser vendor, for instance, can use our
protocol to figure out which homepages are popular, without learning any user's
homepage. We also consider the simpler private subset-histogram problem, in
which the servers want to count how many clients hold strings in a particular
set without revealing this set to the clients.
Our protocols use two data-collection servers and, in a protocol run, each
client send sends only a single message to the servers. Our protocols protect
client privacy against arbitrary misbehavior by one of the servers and our
approach requires no public-key cryptography (except for secure channels), nor
general-purpose multiparty computation. Instead, we rely on incremental
distributed point functions, a new cryptographic tool that allows a client to
succinctly secret-share the labels on the nodes of an exponentially large
binary tree, provided that the tree has a single non-zero path. Along the way,
we develop new general tools for providing malicious security in applications
of distributed point functions.
In an experimental evaluation with two servers on opposite sides of the U.S.,
the servers can find the 200 most popular strings among a set of 400,000
client-held 256-bit strings in 54 minutes. Our protocols are highly
parallelizable. We estimate that with 20 physical machines per logical server,
our protocols could compute heavy hitters over ten million clients in just over
one hour of computation.Comment: To appear in IEEE Security & Privacy 202