450 research outputs found
Revealing Network Structure, Confidentially: Improved Rates for Node-Private Graphon Estimation
Motivated by growing concerns over ensuring privacy on social networks, we
develop new algorithms and impossibility results for fitting complex
statistical models to network data subject to rigorous privacy guarantees. We
consider the so-called node-differentially private algorithms, which compute
information about a graph or network while provably revealing almost no
information about the presence or absence of a particular node in the graph.
We provide new algorithms for node-differentially private estimation for a
popular and expressive family of network models: stochastic block models and
their generalization, graphons. Our algorithms improve on prior work, reducing
their error quadratically and matching, in many regimes, the optimal nonprivate
algorithm. We also show that for the simplest random graph models ( and
), node-private algorithms can be qualitatively more accurate than for
more complex models---converging at a rate of
instead of . This result uses a new extension lemma
for differentially private algorithms that we hope will be broadly useful
Differentially Private Nonparametric Hypothesis Testing
Hypothesis tests are a crucial statistical tool for data mining and are the
workhorse of scientific research in many fields. Here we study differentially
private tests of independence between a categorical and a continuous variable.
We take as our starting point traditional nonparametric tests, which require no
distributional assumption (e.g., normality) about the data distribution. We
present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon
signed-rank tests, as well as the parametric one-sample t-test. These tests use
novel test statistics developed specifically for the private setting. We
compare our tests to prior work, both on parametric and nonparametric tests. We
find that in all cases our new nonparametric tests achieve large improvements
in statistical power, even when the assumptions of parametric tests are met
Adversarial Random Forests for Density Estimation and Generative Modeling
We propose methods for density estimation and
data synthesis using a novel form of unsupervised
random forests. Inspired by generative adversarial
networks, we implement a recursive procedure in
which trees gradually learn structural properties
of the data through alternating rounds of generation and discrimination. The method is provably
consistent under minimal assumptions. Unlike
classic tree-based alternatives, our approach provides smooth (un)conditional densities and allows
for fully synthetic data generation. We achieve
comparable or superior performance to state-ofthe-art probabilistic circuits and deep learning
models on various tabular data benchmarks while
executing about two orders of magnitude faster
on average. An accompanying R package, arf,
is available on CRAN
Data Distillation: A Survey
The popularity of deep learning has led to the curation of a vast number of
massive and multifarious datasets. Despite having close-to-human performance on
individual tasks, training parameter-hungry models on large datasets poses
multi-faceted problems such as (a) high model-training time; (b) slow research
iteration; and (c) poor eco-sustainability. As an alternative, data
distillation approaches aim to synthesize terse data summaries, which can serve
as effective drop-in replacements of the original dataset for scenarios like
model training, inference, architecture search, etc. In this survey, we present
a formal framework for data distillation, along with providing a detailed
taxonomy of existing approaches. Additionally, we cover data distillation
approaches for different data modalities, namely images, graphs, and user-item
interactions (recommender systems), while also identifying current challenges
and future research directions.Comment: Accepted at TMLR '23. 21 pages, 4 figure
Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs
International audienceProving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Previous algorithms based on affine rankings either are applicable only to simple loops (i.e., single-node flowcharts) and rely on enumeration, or are not complete in the sense that they are not guaranteed to find a ranking in the class of functions they consider, if one exists. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also show the links and differences with previous techniques based on the insertion of counters
- …