76,950 research outputs found
Perceptions and Truth: A Mechanism Design Approach to Crowd-Sourcing Reputation
We consider a distributed multi-user system where individual entities possess
observations or perceptions of one another, while the truth is only known to
themselves, and they might have an interest in withholding or distorting the
truth. We ask the question whether it is possible for the system as a whole to
arrive at the correct perceptions or assessment of all users, referred to as
their reputation, by encouraging or incentivizing the users to participate in a
collective effort without violating private information and self-interest. Two
specific applications, online shopping and network reputation, are provided to
motivate our study and interpret the results. In this paper we investigate this
problem using a mechanism design theoretic approach. We introduce a number of
utility models representing users' strategic behavior, each consisting of one
or both of a truth element and an image element, reflecting the user's desire
to obtain an accurate view of the other and an inflated image of itself. For
each model, we either design a mechanism that achieves the optimal performance
(solution to the corresponding centralized problem), or present individually
rational sub-optimal solutions. In the latter case, we demonstrate that even
when the centralized solution is not achievable, by using a simple
punish-reward mechanism, not only a user has the incentive to participate and
provide information, but also that this information can improve the system
performance.Comment: 14 pages, 3 figure
How To Backdoor Federated Learning
Federated learning enables thousands of participants to construct a deep
learning model without sharing their private training data with each other. For
example, multiple smartphones can jointly train a next-word predictor for
keyboards without revealing what individual users type. We demonstrate that any
participant in federated learning can introduce hidden backdoor functionality
into the joint global model, e.g., to ensure that an image classifier assigns
an attacker-chosen label to images with certain features, or that a word
predictor completes certain sentences with an attacker-chosen word.
We design and evaluate a new model-poisoning methodology based on model
replacement. An attacker selected in a single round of federated learning can
cause the global model to immediately reach 100% accuracy on the backdoor task.
We evaluate the attack under different assumptions for the standard
federated-learning tasks and show that it greatly outperforms data poisoning.
Our generic constrain-and-scale technique also evades anomaly detection-based
defenses by incorporating the evasion into the attacker's loss function during
training
Differentially Private Hierarchical Count-of-Counts Histograms
We consider the problem of privately releasing a class of queries that we
call hierarchical count-of-counts histograms. Count-of-counts histograms
partition the rows of an input table into groups (e.g., group of people in the
same household), and for every integer j report the number of groups of size j.
Hierarchical count-of-counts queries report count-of-counts histograms at
different granularities as per hierarchy defined on an attribute in the input
data (e.g., geographical location of a household at the national, state and
county levels). In this paper, we introduce this problem, along with
appropriate error metrics and propose a differentially private solution that
generates count-of-counts histograms that are consistent across all levels of
the hierarchy.Comment: 13 page
Privacy-Preserving Multiparty Learning For Logistic Regression
In recent years, machine learning techniques are widely used in numerous
applications, such as weather forecast, financial data analysis, spam
filtering, and medical prediction. In the meantime, massive data generated from
multiple sources further improve the performance of machine learning tools.
However, data sharing from multiple sources brings privacy issues for those
sources since sensitive information may be leaked in this process. In this
paper, we propose a framework enabling multiple parties to collaboratively and
accurately train a learning model over distributed datasets while guaranteeing
the privacy of data sources. Specifically, we consider logistic regression
model for data training and propose two approaches for perturbing the objective
function to preserve {\epsilon}-differential privacy. The proposed solutions
are tested on real datasets, including Bank Marketing and Credit Card Default
prediction. Experimental results demonstrate that the proposed multiparty
learning framework is highly efficient and accurate.Comment: This work was done when Wei Du was at the University of Arkansa
Towards Federated Learning at Scale: System Design
Federated Learning is a distributed machine learning approach which enables
model training on a large corpus of decentralized data. We have built a
scalable production system for Federated Learning in the domain of mobile
devices, based on TensorFlow. In this paper, we describe the resulting
high-level design, sketch some of the challenges and their solutions, and touch
upon the open problems and future directions
Engineering Methods for Differentially Private Histograms: Efficiency Beyond Utility
Publishing histograms with -differential privacy has been studied
extensively in the literature. Existing schemes aim at maximizing the utility
of the published data, while previous experimental evaluations analyze the
privacy/utility trade-off. In this paper we provide the first experimental
evaluation of differentially private methods that goes beyond utility,
emphasizing also on another important aspect, namely efficiency. Towards this
end, we first observe that all existing schemes are comprised of a small set of
common blocks. We then optimize and choose the best implementation for each
block, determine the combinations of blocks that capture the entire literature,
and propose novel block combinations. We qualitatively assess the quality of
the schemes based on the skyline of efficiency and utility, i.e., based on
whether a method is dominated on both aspects or not. Using exhaustive
experiments on four real datasets with different characteristics, we conclude
that there are always trade-offs in terms of utility and efficiency. We
demonstrate that the schemes derived from our novel block combinations provide
the best trade-offs for time critical applications. Our work can serve as a
guide to help practitioners engineer a differentially private histogram scheme
depending on their application requirements
Quantifying Privacy in Nuclear Warhead Authentication Protocols
International verification of nuclear warheads is a practical problem in
which the protection of secret warhead information is of paramount importance.
We propose a measure that would enable a weapon owner to evaluate the privacy
of a proposed protocol in a technology-neutral fashion. We show the problem is
reducible to `natural' and `corrective' learning. The natural learning can be
computed without assumptions about the inspector, while the corrective learning
accounts for the inspector's prior knowledge. The natural learning provides the
warhead owner a useful lower bound on the information leaked by the proposed
protocol. Using numerical examples, we demonstrate that the proposed measure
correlates better with the accuracy of a maximum a posteriori probability
estimate than alternative measures
Learning Privately from Multiparty Data
Learning a classifier from private data collected by multiple parties is an
important problem that has many potential applications. How can we build an
accurate and differentially private global classifier by combining
locally-trained classifiers from different parties, without access to any
party's private data? We propose to transfer the `knowledge' of the local
classifier ensemble by first creating labeled data from auxiliary unlabeled
data, and then train a global -differentially private classifier. We
show that majority voting is too sensitive and therefore propose a new risk
weighted by class probabilities estimated from the ensemble. Relative to a
non-private solution, our private solution has a generalization error bounded
by where is the number of parties. This allows
strong privacy without performance loss when is large, such as in
crowdsensing applications. We demonstrate the performance of our method with
realistic tasks of activity recognition, network intrusion detection, and
malicious URL detection
Distributed Differentially Private Computation of Functions with Correlated Noise
Many applications of machine learning, such as human health research, involve
processing private or sensitive information. Privacy concerns may impose
significant hurdles to collaboration in scenarios where there are multiple
sites holding data and the goal is to estimate properties jointly across all
datasets. Differentially private decentralized algorithms can provide strong
privacy guarantees. However, the accuracy of the joint estimates may be poor
when the datasets at each site are small. This paper proposes a new framework,
Correlation Assisted Private Estimation (CAPE), for designing
privacy-preserving decentralized algorithms with better accuracy guarantees in
an honest-but-curious model. CAPE can be used in conjunction with the
functional mechanism for statistical and machine learning optimization
problems. A tighter characterization of the functional mechanism is provided
that allows CAPE to achieve the same performance as a centralized algorithm in
the decentralized setting using all datasets. Empirical results on regression
and neural network problems for both synthetic and real datasets show that
differentially private methods can be competitive with non-private algorithms
in many scenarios of interest.Comment: The manuscript is partially subsumed by arXiv:1910.1291
Defending Non-Bayesian Learning against Adversarial Attacks
This paper addresses the problem of non-Bayesian learning over multi-agent
networks, where agents repeatedly collect partially informative observations
about an unknown state of the world, and try to collaboratively learn the true
state. We focus on the impact of the adversarial agents on the performance of
consensus-based non-Bayesian learning, where non-faulty agents combine local
learning updates with consensus primitives. In particular, we consider the
scenario where an unknown subset of agents suffer Byzantine faults -- agents
suffering Byzantine faults behave arbitrarily. Two different learning rules are
proposed
- …