We describe a network-based data-leak detection (DLD)
technique, the main feature of which is that the detection
does not require the data owner to reveal the content of the
sensitive data. Instead, only a small amount of specialized
digests are needed. Our technique – referred to as the fuzzy
fingerprint – can be used to detect accidental data leaks due
to human errors or application flaws. The privacy-preserving
feature of our algorithms minimizes the exposure of sensitive
data and enables the data owner to safely delegate the
detection to others.We describe how cloud providers can offer
their customers data-leak detection as an add-on service
with strong privacy guarantees.
We perform extensive experimental evaluation on the privacy,
efficiency, accuracy and noise tolerance of our techniques.
Our evaluation results under various data-leak scenarios
and setups show that our method can support accurate
detection with very small number of false alarms, even
when the presentation of the data has been transformed. It
also indicates that the detection accuracy does not degrade
when partial digests are used. We further provide a quantifiable
method to measure the privacy guarantee offered by our
fuzzy fingerprint framework