33 research outputs found
Recommended from our members
New Data Protection Abstractions for Emerging Mobile and Big Data Workloads
Two recent shifts in computing are challenging the effectiveness of traditional approaches to data protection. Emerging machine learning workloads have complex access patterns and unique leakage characteristics that are not well supported by existing protection approaches. Second, mobile operating systems do not provide sufficient support for fine grained data protection tools forcing users to rely on individual applications to correctly manage and protect data. My thesis is that these emerging workloads have unique characteristics that we can leverage to build new, more effective data protection abstractions.
This dissertation presents two new data protection systems for machine learning work-loads and a new system for fine grained data management and protection on mobile devices. First is Sage, a differentially private machine learning platform addressing the two primary challenges of differential privacy: running out of budget and the privacy utility tradeoff. The second system, Pyramid, is the first selective data system. Pyramid leverages count featurization to reduce the amount of data exposed while training classification models by two orders of magnitude. The final system, Pebbles, provides users with logical data objects as a new fine grained data management and protection primitive allowing data management at a higher level of abstraction. Pebbles, leverages high level storage abstractions in mobile operating systems to discover user recognizable application level data objects in unmodified mobile applications
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
XRay: Enhancing the Web's Transparency with Differential Correlation
Today's Web services - such as Google, Amazon, and Facebook - leverage user
data for varied purposes, including personalizing recommendations, targeting
advertisements, and adjusting prices. At present, users have little insight
into how their data is being used. Hence, they cannot make informed choices
about the services they choose. To increase transparency, we developed XRay,
the first fine-grained, robust, and scalable personal data tracking system for
the Web. XRay predicts which data in an arbitrary Web account (such as emails,
searches, or viewed products) is being used to target which outputs (such as
ads, recommended products, or prices). XRay's core functions are service
agnostic and easy to instantiate for new services, and they can track data
within and across services. To make predictions independent of the audited
service, XRay relies on the following insight: by comparing outputs from
different accounts with similar, but not identical, subsets of data, one can
pinpoint targeting through correlation. We show both theoretically, and through
experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision
and recall by correlating data from a surprisingly small number of extra
accounts.Comment: Extended version of a paper presented at the 23rd USENIX Security
Symposium (USENIX Security 14
ACMiner: Extraction and Analysis of Authorization Checks in Android's Middleware
Billions of users rely on the security of the Android platform to protect
phones, tablets, and many different types of consumer electronics. While
Android's permission model is well studied, the enforcement of the protection
policy has received relatively little attention. Much of this enforcement is
spread across system services, taking the form of hard-coded checks within
their implementations. In this paper, we propose Authorization Check Miner
(ACMiner), a framework for evaluating the correctness of Android's access
control enforcement through consistency analysis of authorization checks.
ACMiner combines program and text analysis techniques to generate a rich set of
authorization checks, mines the corresponding protection policy for each
service entry point, and uses association rule mining at a service granularity
to identify inconsistencies that may correspond to vulnerabilities. We used
ACMiner to study the AOSP version of Android 7.1.1 to identify 28
vulnerabilities relating to missing authorization checks. In doing so, we
demonstrate ACMiner's ability to help domain experts process thousands of
authorization checks scattered across millions of lines of code