784,907 research outputs found
Differentially Private ANOVA Testing
Modern society generates an incredible amount of data about individuals, and
releasing summary statistics about this data in a manner that provably protects
individual privacy would offer a valuable resource for researchers in many
fields. We present the first algorithm for analysis of variance (ANOVA) that
preserves differential privacy, allowing this important statistical test to be
conducted (and the results released) on databases of sensitive information. In
addition to our private algorithm for the F test statistic, we show a rigorous
way to compute p-values that accounts for the added noise needed to preserve
privacy. Finally, we present experimental results quantifying the statistical
power of this differentially private version of the test, finding that a sample
of several thousand observations is frequently enough to detect variation
between groups. The differentially private ANOVA algorithm is a promising
approach for releasing a common test statistic that is valuable in fields in
the sciences and social sciences.Comment: Accepted, camera-ready version presented at the 1st International
Conference on Data Intelligence and Security (ICDIS) 201
Differentially Private Nonparametric Hypothesis Testing
Hypothesis tests are a crucial statistical tool for data mining and are the
workhorse of scientific research in many fields. Here we study differentially
private tests of independence between a categorical and a continuous variable.
We take as our starting point traditional nonparametric tests, which require no
distributional assumption (e.g., normality) about the data distribution. We
present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon
signed-rank tests, as well as the parametric one-sample t-test. These tests use
novel test statistics developed specifically for the private setting. We
compare our tests to prior work, both on parametric and nonparametric tests. We
find that in all cases our new nonparametric tests achieve large improvements
in statistical power, even when the assumptions of parametric tests are met
- …
