Where distributed agents must share voluminous set membership information,
Bloom filters provide a compact, though lossy, way for them to do so. Numerous
recent networking papers have examined the trade-offs between the bandwidth
consumed by the transmission of Bloom filters, and the error rate, which takes
the form of false positives, and which rises the more the filters are
compressed. In this paper, we introduce the retouched Bloom filter (RBF), an
extension that makes the Bloom filter more flexible by permitting the removal
of selected false positives at the expense of generating random false
negatives. We analytically show that RBFs created through a random process
maintain an overall error rate, expressed as a combination of the false
positive rate and the false negative rate, that is equal to the false positive
rate of the corresponding Bloom filters. We further provide some simple
heuristics and improved algorithms that decrease the false positive rate more
than than the corresponding increase in the false negative rate, when creating
RBFs. Finally, we demonstrate the advantages of an RBF over a Bloom filter in a
distributed network topology measurement application, where information about
large stop sets must be shared among route tracing monitors.Comment: This is a new version of the technical reports with improved
algorithms and theorical analysis of algorithm