93,148 research outputs found

    Differentially Private Empirical Risk Minimization

    Full text link
    Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ϵ\epsilon-differential privacy definition due to Dwork et al. (2006). First we apply the output perturbation ideas of Dwork et al. (2006), to ERM classification. Then we propose a new method, objective perturbation, for privacy-preserving machine learning algorithm design. This method entails perturbing the objective function before optimizing over classifiers. If the loss and regularizer satisfy certain convexity and differentiability criteria, we prove theoretical results showing that our algorithms preserve privacy, and provide generalization bounds for linear and nonlinear kernels. We further present a privacy-preserving technique for tuning the parameters in general machine learning algorithms, thereby providing end-to-end privacy guarantees for the training process. We apply these results to produce privacy-preserving analogues of regularized logistic regression and support vector machines. We obtain encouraging results from evaluating their performance on real demographic and benchmark data sets. Our results show that both theoretically and empirically, objective perturbation is superior to the previous state-of-the-art, output perturbation, in managing the inherent tradeoff between privacy and learning performance.Comment: 40 pages, 7 figures, accepted to the Journal of Machine Learning Researc

    Empirical risk minimization in inverse problems

    Full text link
    We study estimation of a multivariate function f:RdRf:\mathbf{R}^d\to\mathbf{R} when the observations are available from the function AfAf, where AA is a known linear operator. Both the Gaussian white noise model and density estimation are studied. We define an L2L_2-empirical risk functional which is used to define a δ\delta-net minimizer and a dense empirical risk minimizer. Upper bounds for the mean integrated squared error of the estimators are given. The upper bounds show how the difficulty of the estimation depends on the operator through the norm of the adjoint of the inverse of the operator and on the underlying function class through the entropy of the class. Corresponding lower bounds are also derived. As examples, we consider convolution operators and the Radon transform. In these examples, the estimators achieve the optimal rates of convergence. Furthermore, a new type of oracle inequality is given for inverse problems in additive models.Comment: Published in at http://dx.doi.org/10.1214/09-AOS726 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    On concentration for (regularized) empirical risk minimization

    Full text link
    Rates of convergence for empirical risk minimizers have been well studied in the literature. In this paper, we aim to provide a complementary set of results, in particular by showing that after normalization, the risk of the empirical minimizer concentrates on a single point. Such results have been established by~\cite{chatterjee2014new} for constrained estimators in the normal sequence model. We first generalize and sharpen this result to regularized least squares with convex penalties, making use of a "direct" argument based on Borell's theorem. We then study generalizations to other loss functions, including the negative log-likelihood for exponential families combined with a strictly convex regularization penalty. The results in this general setting are based on more "indirect" arguments as well as on concentration inequalities for maxima of empirical processes.Comment: 27 page

    Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning

    Get PDF
    Probabilistic grammars are generative statistical models that are useful for compositional and sequential structures. They are used ubiquitously in computational linguistics. We present a framework, reminiscent of structural risk minimization, for empirical risk minimization of probabilistic grammars using the log-loss. We derive sample complexity bounds in this framework that apply both to the supervised setting and the unsupervised setting. By making assumptions about the underlying distribution that are appropriate for natural language scenarios, we are able to derive distribution-dependent sample complexity bounds for probabilistic grammars. We also give simple algorithms for carrying out empirical risk minimization using this framework in both the supervised and unsupervised settings. In the unsupervised case, we show that the problem of minimizing empirical risk is NP-hard. We therefore suggest an approximate algorithm, similar to expectation-maximization, to minimize the empirical risk. Learning from data is central to contemporary computational linguistics. It is in common in such learning to estimate a model in a parametric family using the maximum likelihood principle. This principle applies in the supervised case (i.e., using annotate

    Explainable Empirical Risk Minimization

    Full text link
    The widespread use of modern machine learning methods in decision making crucially depends on their interpretability or explainability. The human users (decision makers) of machine learning methods are often not only interested in getting accurate predictions or projections. Rather, as a decision-maker, the user also needs a convincing answer (or explanation) to the question of why a particular prediction was delivered. Explainable machine learning might be a legal requirement when used for decision making with an immediate effect on the health of human beings. As an example consider the computer vision of a self-driving car whose predictions are used to decide if to stop the car. We have recently proposed an information-theoretic approach to construct personalized explanations for predictions obtained from ML. This method was model-agnostic and only required some training samples of the model to be explained along with a user feedback signal. This paper uses an information-theoretic measure for the quality of an explanation to learn predictors that are intrinsically explainable to a specific user. Our approach is not restricted to a particular hypothesis space, such as linear maps or shallow decision trees, whose predictor maps are considered as explainable by definition. Rather, we regularize an arbitrary hypothesis space using a personalized measure for the explainability of a particular predictor
    corecore