106 research outputs found
Proving Differential Privacy with Shadow Execution
Recent work on formal verification of differential privacy shows a trend
toward usability and expressiveness -- generating a correctness proof of
sophisticated algorithm while minimizing the annotation burden on programmers.
Sometimes, combining those two requires substantial changes to program logics:
one recent paper is able to verify Report Noisy Max automatically, but it
involves a complex verification system using customized program logics and
verifiers.
In this paper, we propose a new proof technique, called shadow execution, and
embed it into a language called ShadowDP. ShadowDP uses shadow execution to
generate proofs of differential privacy with very few programmer annotations
and without relying on customized logics and verifiers. In addition to
verifying Report Noisy Max, we show that it can verify a new variant of Sparse
Vector that reports the gap between some noisy query answers and the noisy
threshold. Moreover, ShadowDP reduces the complexity of verification: for all
of the algorithms we have evaluated, type checking and verification in total
takes at most 3 seconds, while prior work takes minutes on the same algorithms.Comment: 23 pages, 12 figures, PLDI'1
Preserving Statistical Validity in Adaptive Data Analysis
A great deal of effort has been devoted to reducing the risk of spurious
scientific discoveries, from the use of sophisticated validation techniques, to
deep statistical methods for controlling the false discovery rate in multiple
hypothesis testing. However, there is a fundamental disconnect between the
theoretical results and the practice of data analysis: the theory of
statistical inference assumes a fixed collection of hypotheses to be tested, or
learning algorithms to be applied, selected non-adaptively before the data are
gathered, whereas in practice data is shared and reused with hypotheses and new
analyses being generated on the basis of data exploration and the outcomes of
previous analyses.
In this work we initiate a principled study of how to guarantee the validity
of statistical inference in adaptive data analysis. As an instance of this
problem, we propose and investigate the question of estimating the expectations
of adaptively chosen functions on an unknown distribution given random
samples.
We show that, surprisingly, there is a way to estimate an exponential in
number of expectations accurately even if the functions are chosen adaptively.
This gives an exponential improvement over standard empirical estimators that
are limited to a linear number of estimates. Our result follows from a general
technique that counter-intuitively involves actively perturbing and
coordinating the estimates, using techniques developed for privacy
preservation. We give additional applications of this technique to our
question.Comment: Updated related work with recent development
- …