11,416 research outputs found
Differential Privacy in Metric Spaces: Numerical, Categorical and Functional Data Under the One Roof
We study Differential Privacy in the abstract setting of Probability on
metric spaces. Numerical, categorical and functional data can be handled in a
uniform manner in this setting. We demonstrate how mechanisms based on data
sanitisation and those that rely on adding noise to query responses fit within
this framework. We prove that once the sanitisation is differentially private,
then so is the query response for any query. We show how to construct
sanitisations for high-dimensional databases using simple 1-dimensional
mechanisms. We also provide lower bounds on the expected error for
differentially private sanitisations in the general metric space setting.
Finally, we consider the question of sufficient sets for differential privacy
and show that for relaxed differential privacy, any algebra generating the
Borel -algebra is a sufficient set for relaxed differential privacy.Comment: 18 Page
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
- …