93,148 research outputs found
Differentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the
increasingly common setting in which personal data, such as medical or
financial records, are analyzed. We provide general techniques to produce
privacy-preserving approximations of classifiers learned via (regularized)
empirical risk minimization (ERM). These algorithms are private under the
-differential privacy definition due to Dwork et al. (2006). First we
apply the output perturbation ideas of Dwork et al. (2006), to ERM
classification. Then we propose a new method, objective perturbation, for
privacy-preserving machine learning algorithm design. This method entails
perturbing the objective function before optimizing over classifiers. If the
loss and regularizer satisfy certain convexity and differentiability criteria,
we prove theoretical results showing that our algorithms preserve privacy, and
provide generalization bounds for linear and nonlinear kernels. We further
present a privacy-preserving technique for tuning the parameters in general
machine learning algorithms, thereby providing end-to-end privacy guarantees
for the training process. We apply these results to produce privacy-preserving
analogues of regularized logistic regression and support vector machines. We
obtain encouraging results from evaluating their performance on real
demographic and benchmark data sets. Our results show that both theoretically
and empirically, objective perturbation is superior to the previous
state-of-the-art, output perturbation, in managing the inherent tradeoff
between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning
Researc
Empirical risk minimization in inverse problems
We study estimation of a multivariate function
when the observations are available from the function , where is a
known linear operator. Both the Gaussian white noise model and density
estimation are studied. We define an -empirical risk functional which is
used to define a -net minimizer and a dense empirical risk minimizer.
Upper bounds for the mean integrated squared error of the estimators are given.
The upper bounds show how the difficulty of the estimation depends on the
operator through the norm of the adjoint of the inverse of the operator and on
the underlying function class through the entropy of the class. Corresponding
lower bounds are also derived. As examples, we consider convolution operators
and the Radon transform. In these examples, the estimators achieve the optimal
rates of convergence. Furthermore, a new type of oracle inequality is given for
inverse problems in additive models.Comment: Published in at http://dx.doi.org/10.1214/09-AOS726 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On concentration for (regularized) empirical risk minimization
Rates of convergence for empirical risk minimizers have been well studied in
the literature. In this paper, we aim to provide a complementary set of
results, in particular by showing that after normalization, the risk of the
empirical minimizer concentrates on a single point. Such results have been
established by~\cite{chatterjee2014new} for constrained estimators in the
normal sequence model. We first generalize and sharpen this result to
regularized least squares with convex penalties, making use of a "direct"
argument based on Borell's theorem. We then study generalizations to other loss
functions, including the negative log-likelihood for exponential families
combined with a strictly convex regularization penalty. The results in this
general setting are based on more "indirect" arguments as well as on
concentration inequalities for maxima of empirical processes.Comment: 27 page
Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning
Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate
Explainable Empirical Risk Minimization
The widespread use of modern machine learning methods in decision making
crucially depends on their interpretability or explainability. The human users
(decision makers) of machine learning methods are often not only interested in
getting accurate predictions or projections. Rather, as a decision-maker, the
user also needs a convincing answer (or explanation) to the question of why a
particular prediction was delivered. Explainable machine learning might be a
legal requirement when used for decision making with an immediate effect on the
health of human beings. As an example consider the computer vision of a
self-driving car whose predictions are used to decide if to stop the car. We
have recently proposed an information-theoretic approach to construct
personalized explanations for predictions obtained from ML. This method was
model-agnostic and only required some training samples of the model to be
explained along with a user feedback signal. This paper uses an
information-theoretic measure for the quality of an explanation to learn
predictors that are intrinsically explainable to a specific user. Our approach
is not restricted to a particular hypothesis space, such as linear maps or
shallow decision trees, whose predictor maps are considered as explainable by
definition. Rather, we regularize an arbitrary hypothesis space using a
personalized measure for the explainability of a particular predictor
- …