97 research outputs found

### Ergodicity of Random Walks on Random DFA

Given a DFA we consider the random walk that starts at the initial state and
at each time step moves to a new state by taking a random transition from the
current state. This paper shows that for typical DFA this random walk induces
an ergodic Markov chain. The notion of typical DFA is formalized by showing
that ergodicity holds with high probability when a DFA is sampled uniformly at
random from the set of all automata with a fixed number of states. We also show
the same result applies to DFA obtained by minimizing typical DFA

### Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising

The Gaussian mechanism is an essential building block used in multitude of
differentially private data analysis algorithms. In this paper we revisit the
Gaussian mechanism and show that the original analysis has several important
limitations. Our analysis reveals that the variance formula for the original
mechanism is far from tight in the high privacy regime ($\varepsilon \to 0$)
and it cannot be extended to the low privacy regime ($\varepsilon \to \infty$).
We address these limitations by developing an optimal Gaussian mechanism whose
variance is calibrated directly using the Gaussian cumulative density function
instead of a tail bound approximation. We also propose to equip the Gaussian
mechanism with a post-processing step based on adaptive estimation techniques
by leveraging that the distribution of the perturbation is known. Our
experiments show that analytical calibration removes at least a third of the
variance of the noise compared to the classical Gaussian mechanism, and that
denoising dramatically improves the accuracy of the Gaussian mechanism in the
high-dimensional regime.Comment: To appear at the 35th International Conference on Machine Learning
(ICML), 201

### Differentially Private Policy Evaluation

We present the first differentially private algorithms for reinforcement
learning, which apply to the task of evaluating a fixed policy. We establish
two approaches for achieving differential privacy, provide a theoretical
analysis of the privacy and utility of the two algorithms, and show promising
results on simple empirical examples

### Singular value automata and approximate minimization

The present paper uses spectral theory of linear operators to construct
approximately minimal realizations of weighted languages. Our new contributions
are: (i) a new algorithm for the SVD decomposition of infinite Hankel matrices
based on their representation in terms of weighted automata, (ii) a new
canonical form for weighted automata arising from the SVD of its corresponding
Hankel matrix and (iii) an algorithm to construct approximate minimizations of
given weighted automata by truncating the canonical form. We give bounds on the
quality of our approximation

### A Canonical Form for Weighted Automata and Applications to Approximate Minimization

We study the problem of constructing approximations to a weighted automaton.
Weighted finite automata (WFA) are closely related to the theory of rational
series. A rational series is a function from strings to real numbers that can
be computed by a finite WFA. Among others, this includes probability
distributions generated by hidden Markov models and probabilistic automata. The
relationship between rational series and WFA is analogous to the relationship
between regular languages and ordinary automata. Associated with such rational
series are infinite matrices called Hankel matrices which play a fundamental
role in the theory of minimal WFA. Our contributions are: (1) an effective
procedure for computing the singular value decomposition (SVD) of such infinite
Hankel matrices based on their representation in terms of finite WFA; (2) a new
canonical form for finite WFA based on this SVD decomposition; and, (3) an
algorithm to construct approximate minimizations of a given WFA. The goal of
our approximate minimization algorithm is to start from a minimal WFA and
produce a smaller WFA that is close to the given one in a certain sense. The
desired size of the approximating automaton is given as input. We give bounds
describing how well the approximation emulates the behavior of the original
WFA

### Subsampled R\'enyi Differential Privacy and Analytical Moments Accountant

We study the problem of subsampling in differential privacy (DP), a question
that is the centerpiece behind many successful differentially private machine
learning algorithms. Specifically, we provide a tight upper bound on the
R\'enyi Differential Privacy (RDP) (Mironov, 2017) parameters for algorithms
that: (1) subsample the dataset, and then (2) applies a randomized mechanism M
to the subsample, in terms of the RDP parameters of M and the subsampling
probability parameter. Our results generalize the moments accounting technique,
developed by Abadi et al. (2016) for the Gaussian mechanism, to any subsampled
RDP mechanism

### Diameter and Stationary Distribution of Random $r$-out Digraphs

Let $D(n,r)$ be a random $r$-out regular directed multigraph on the set of
vertices $\{1,\ldots,n\}$. In this work, we establish that for every $r \ge 2$,
there exists $\eta_r>0$ such that
$\text{diam}(D(n,r))=(1+\eta_r+o(1))\log_r{n}$. Our techniques also allow us to
bound some extremal quantities related to the stationary distribution of a
simple random walk on $D(n,r)$. In particular, we determine the asymptotic
behaviour of $\pi_{\max}$ and $\pi_{\min}$, the maximum and the minimum values
of the stationary distribution. We show that with high probability $\pi_{\max}
= n^{-1+o(1)}$ and $\pi_{\min}=n^{-(1+\eta_r)+o(1)}$. Our proof shows that the
vertices with $\pi(v)$ near to $\pi_{\min}$ lie at the top of "narrow, slippery
towers", such vertices are also responsible for increasing the diameter from
$(1+o(1))\log_r n$ to $(1+\eta_r+o(1))\log_r{n}$.Comment: 31 page

### Privacy Amplification by Mixing and Diffusion Mechanisms

A fundamental result in differential privacy states that the privacy
guarantees of a mechanism are preserved by any post-processing of its output.
In this paper we investigate under what conditions stochastic post-processing
can amplify the privacy of a mechanism. By interpreting post-processing as the
application of a Markov operator, we first give a series of amplification
results in terms of uniform mixing properties of the Markov process defined by
said operator. Next we provide amplification bounds in terms of coupling
arguments which can be applied in cases where uniform mixing is not available.
Finally, we introduce a new family of mechanisms based on diffusion processes
which are closed under post-processing, and analyze their privacy via a novel
heat flow argument. On the applied side, we generalize the analysis of "privacy
amplification by iteration" in Noisy SGD and show it admits an exponential
improvement in the strongly convex case, and study a mechanism based on the
Ornstein-Uhlenbeck diffusion process which contains the Gaussian mechanism with
optimal post-processing on bounded inputs as a special case

### The Privacy Blanket of the Shuffle Model

This work studies differential privacy in the context of the recently
proposed shuffle model. Unlike in the local model, where the server collecting
privatized data from users can track back an input to a specific user, in the
shuffle model users submit their privatized inputs to a server anonymously.
This setup yields a trust model which sits in between the classical curator and
local models for differential privacy. The shuffle model is the core idea in
the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS
2017). Recent work by Cheu et al. (EUROCRYPT 2019) analyzes the differential
privacy properties of the shuffle model and shows that in some cases shuffled
protocols provide strictly better accuracy than local protocols. Additionally,
Erlingsson et al. (SODA 2019) provide a privacy amplification bound quantifying
the level of curator differential privacy achieved by the shuffle model in
terms of the local differential privacy of the randomizer used by each user. In
this context, we make three contributions. First, we provide an optimal single
message protocol for summation of real numbers in the shuffle model. Our
protocol is very simple and has better accuracy and communication than the
protocols for this same problem proposed by Cheu et al. Optimality of this
protocol follows from our second contribution, a new lower bound for the
accuracy of private protocols for summation of real numbers in the shuffle
model. The third contribution is a new amplification bound for analyzing the
privacy of protocols in the shuffle model in terms of the privacy provided by
the corresponding local randomizer. Our amplification bound generalizes the
results by Erlingsson et al. to a wider range of parameters, and provides a
whole family of methods to analyze privacy amplification in the shuffle model

### Privacy-preserving Active Learning on Sensitive Data for User Intent Classification

Active learning holds promise of significantly reducing data annotation costs
while maintaining reasonable model performance. However, it requires sending
data to annotators for labeling. This presents a possible privacy leak when the
training set includes sensitive user data. In this paper, we describe an
approach for carrying out privacy preserving active learning with quantifiable
guarantees. We evaluate our approach by showing the tradeoff between privacy,
utility and annotation budget on a binary classification task in a active
learning setting.Comment: To appear at PAL: Privacy-Enhancing Artificial Intelligence and
Language Technologies as part of the AAAI Spring Symposium Series (AAAI-SSS
2019

- β¦