The current state of the industry’s methods of collecting background data reflecting diagnostic and usage information are often opaque and require users to place a lot of trust in the entity receiving the data. For vendors, having a centralized database of potentially sensitive data is a privacy protection headache and a potential liability should a breach of that database occur. Unfortunately, high profile privacy failures are not uncommon, so many individuals and companies are understandably skeptical and choose not to contribute any information. It is a shame, since the data could be used for improving reliability, or getting stronger security, or for valuable academic research into real-world usage patterns.
We propose, implement and evaluate a framework for non-realtime anonymous data collection, aggregation for analysis, and feedback. Departing from the usual “trusted core” approach, we aim to maintain reporters’ anonymity even if the centralized part of the system is compromised. We design a peer-to-peer mix network and its protocol that are tuned to the properties of background diagnostic traffic. Our system delivers data to a centralized repository while maintaining (i) source anonymity, (ii) privacy in transit, and (iii) the ability to provide analysis feedback back to the source. By removing the core’s ability to identify the source of data and to track users over time, we drastically reduce its attractiveness as a potential attack target and allow vendors to make concrete and verifiable privacy and anonymity claims