238 research outputs found

### Influence in Classification via Cooperative Game Theory

A dataset has been classified by some unknown classifier into two types of
points. What were the most important factors in determining the classification
outcome? In this work, we employ an axiomatic approach in order to uniquely
characterize an influence measure: a function that, given a set of classified
points, outputs a value for each feature corresponding to its influence in
determining the classification outcome. We show that our influence measure
takes on an intuitive form when the unknown classifier is linear. Finally, we
employ our influence measure in order to analyze the effects of user profiling
on Google's online display advertising.Comment: accepted to IJCAI 201

### A Logical Method for Policy Enforcement over Evolving Audit Logs

We present an iterative algorithm for enforcing policies represented in a
first-order logic, which can, in particular, express all transmission-related
clauses in the HIPAA Privacy Rule. The logic has three features that raise
challenges for enforcement --- uninterpreted predicates (used to model
subjective concepts in privacy policies), real-time temporal properties, and
quantification over infinite domains (such as the set of messages containing
personal information). The algorithm operates over audit logs that are
inherently incomplete and evolve over time. In each iteration, the algorithm
provably checks as much of the policy as possible over the current log and
outputs a residual policy that can only be checked when the log is extended
with additional information. We prove correctness and termination properties of
the algorithm. While these results are developed in a general form, accounting
for many different sources of incompleteness in audit logs, we also prove that
for the special case of logs that maintain a complete record of all relevant
actions, the algorithm effectively enforces all safety and co-safety
properties. The algorithm can significantly help automate enforcement of
policies derived from the HIPAA Privacy Rule.Comment: Carnegie Mellon University CyLab Technical Report. 51 page

### A Methodology for Information Flow Experiments

Information flow analysis has largely ignored the setting where the analyst
has neither control over nor a complete model of the analyzed system. We
formalize such limited information flow analyses and study an instance of it:
detecting the usage of data by websites. We prove that these problems are ones
of causal inference. Leveraging this connection, we push beyond traditional
information flow analysis to provide a systematic methodology based on
experimental science and statistical analysis. Our methodology allows us to
systematize prior works in the area viewing them as instances of a general
approach. Our systematic study leads to practical advice for improving work on
detecting data usage, a previously unformalized area. We illustrate these
concepts with a series of experiments collecting data on the use of information
by websites, which we statistically analyze

### Formal Verification of Differential Privacy for Interactive Systems

Differential privacy is a promising approach to privacy preserving data
analysis with a well-developed theory for functions. Despite recent work on
implementing systems that aim to provide differential privacy, the problem of
formally verifying that these systems have differential privacy has not been
adequately addressed. This paper presents the first results towards automated
verification of source code for differentially private interactive systems. We
develop a formal probabilistic automaton model of differential privacy for
systems by adapting prior work on differential privacy for functions. The main
technical result of the paper is a sound proof technique based on a form of
probabilistic bisimulation relation for proving that a system modeled as a
probabilistic automaton satisfies differential privacy. The novelty lies in the
way we track quantitative privacy leakage bounds using a relation family
instead of a single relation. We illustrate the proof technique on a
representative automaton motivated by PINQ, an implemented system that is
intended to provide differential privacy. To make our proof technique easier to
apply to realistic systems, we prove a form of refinement theorem and apply it
to show that a refinement of the abstract PINQ automaton also satisfies our
differential privacy definition. Finally, we begin the process of automating
our proof technique by providing an algorithm for mechanically checking a
restricted class of relations from the proof technique.Comment: 65 pages with 1 figur

### Differentially Private Data Analysis of Social Networks via Restricted Sensitivity

We introduce the notion of restricted sensitivity as an alternative to global
and smooth sensitivity to improve accuracy in differentially private data
analysis. The definition of restricted sensitivity is similar to that of global
sensitivity except that instead of quantifying over all possible datasets, we
take advantage of any beliefs about the dataset that a querier may have, to
quantify over a restricted class of datasets. Specifically, given a query f and
a hypothesis H about the structure of a dataset D, we show generically how to
transform f into a new query f_H whose global sensitivity (over all datasets
including those that do not satisfy H) matches the restricted sensitivity of
the query f. Moreover, if the belief of the querier is correct (i.e., D is in
H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be
inaccurate.
We demonstrate the usefulness of this notion by considering the task of
answering queries regarding social-networks, which we model as a combination of
a graph and a labeling of its vertices. In particular, while our generic
procedure is computationally inefficient, for the specific definition of H as
graphs of bounded degree, we exhibit efficient ways of constructing f_H using
different projection-based techniques. We then analyze two important query
classes: subgraph counting queries (e.g., number of triangles) and local
profile queries (e.g., number of people who know a spy and a computer-scientist
who know each other). We demonstrate that the restricted sensitivity of such
queries can be significantly lower than their smooth sensitivity. Thus, using
restricted sensitivity we can maintain privacy whether or not D is in H, while
providing more accurate results in the event that H holds true

### Towards Human Computable Passwords

An interesting challenge for the cryptography community is to design
authentication protocols that are so simple that a human can execute them
without relying on a fully trusted computer. We propose several candidate
authentication protocols for a setting in which the human user can only receive
assistance from a semi-trusted computer --- a computer that stores information
and performs computations correctly but does not provide confidentiality. Our
schemes use a semi-trusted computer to store and display public challenges
$C_i\in[n]^k$. The human user memorizes a random secret mapping
$\sigma:[n]\rightarrow\mathbb{Z}_d$ and authenticates by computing responses
$f(\sigma(C_i))$ to a sequence of public challenges where
$f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d$ is a function that is easy for the
human to evaluate. We prove that any statistical adversary needs to sample
$m=\tilde{\Omega}(n^{s(f)})$ challenge-response pairs to recover $\sigma$, for
a security parameter $s(f)$ that depends on two key properties of $f$. To
obtain our results, we apply the general hypercontractivity theorem to lower
bound the statistical dimension of the distribution over challenge-response
pairs induced by $f$ and $\sigma$. Our lower bounds apply to arbitrary
functions $f$ (not just to functions that are easy for a human to evaluate),
and generalize recent results of Feldman et al. As an application, we propose a
family of human computable password functions $f_{k_1,k_2}$ in which the user
needs to perform $2k_1+2k_2+1$ primitive operations (e.g., adding two digits or
remembering $\sigma(i)$), and we show that $s(f) = \min\{k_1+1, (k_2+1)/2\}$.
For these schemes, we prove that forging passwords is equivalent to recovering
the secret mapping. Thus, our human computable password schemes can maintain
strong security guarantees even after an adversary has observed the user login
to many different accounts.Comment: Fixed bug in definition of Q^{f,j} and modified proofs accordingl

- …