83 research outputs found
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
Learning from Data with Heterogeneous Noise using SGD
We consider learning from data of variable quality that may be obtained from
different heterogeneous sources. Addressing learning from heterogeneous data in
its full generality is a challenging problem. In this paper, we adopt instead a
model in which data is observed through heterogeneous noise, where the noise
level reflects the quality of the data source. We study how to use stochastic
gradient algorithms to learn in this model. Our study is motivated by two
concrete examples where this problem arises naturally: learning with local
differential privacy based on data from multiple sources with different privacy
requirements, and learning from data with labels of variable quality.
The main contribution of this paper is to identify how heterogeneous noise
impacts performance. We show that given two datasets with heterogeneous noise,
the order in which to use them in standard SGD depends on the learning rate. We
propose a method for changing the learning rate as a function of the
heterogeneity, and prove new regret bounds for our method in two cases of
interest. Experiments on real data show that our method performs better than
using a single learning rate and using only the less noisy of the two datasets
when the noise level is low to moderate
Auditing: Active Learning with Outcome-Dependent Query Costs
We propose a learning setting in which unlabeled data is free, and the cost
of a label depends on its value, which is not known in advance. We study binary
classification in an extreme case, where the algorithm only pays for negative
labels. Our motivation are applications such as fraud detection, in which
investigating an honest transaction should be avoided if possible. We term the
setting auditing, and consider the auditing complexity of an algorithm: the
number of negative labels the algorithm requires in order to learn a hypothesis
with low relative error. We design auditing algorithms for simple hypothesis
classes (thresholds and rectangles), and show that with these algorithms, the
auditing complexity can be significantly lower than the active label
complexity. We also discuss a general competitive approach for auditing and
possible modifications to the framework.Comment: Corrections in section
Generalized Opinion Dynamics from Local Optimization Rules
We study generalizations of the Hegselmann-Krause (HK) model for opinion
dynamics, incorporating features and parameters that are natural components of
observed social systems. The first generalization is one where the strength of
influence depends on the distance of the agents' opinions. Under this setup, we
identify conditions under which the opinions converge in finite time, and
provide a qualitative characterization of the equilibrium. We interpret the HK
model opinion update rule as a quadratic cost-minimization rule. This enables a
second generalization: a family of update rules which possess different
equilibrium properties. Subsequently, we investigate models in which a external
force can behave strategically to modulate/influence user updates. We consider
cases where this external force can introduce additional agents and cases where
they can modify the cost structures for other agents. We describe and analyze
some strategies through which such modulation may be possible in an
order-optimal manner. Our simulations demonstrate that generalized dynamics
differ qualitatively and quantitatively from traditional HK dynamics.Comment: 20 pages, under revie
- …