19 research outputs found
A Hybrid Approach to Privacy-Preserving Federated Learning
Federated learning facilitates the collaborative training of models without
the sharing of raw data. However, recent attacks demonstrate that simply
maintaining data locality during training processes does not provide sufficient
privacy guarantees. Rather, we need a federated learning system capable of
preventing inference over both the messages exchanged during training and the
final trained model while ensuring the resulting model also has acceptable
predictive accuracy. Existing federated learning approaches either use secure
multiparty computation (SMC) which is vulnerable to inference or differential
privacy which can lead to low accuracy given a large number of parties with
relatively small amounts of data each. In this paper, we present an alternative
approach that utilizes both differential privacy and SMC to balance these
trade-offs. Combining differential privacy with secure multiparty computation
enables us to reduce the growth of noise injection as the number of parties
increases without sacrificing privacy while maintaining a pre-defined rate of
trust. Our system is therefore a scalable approach that protects against
inference threats and produces models with high accuracy. Additionally, our
system can be used to train a variety of machine learning models, which we
validate with experimental results on 3 different machine learning algorithms.
Our experiments demonstrate that our approach out-performs state of the art
solutions
A Novel Privacy-Preserved Recommender System Framework based on Federated Learning
Recommender System (RS) is currently an effective way to solve information
overload. To meet users' next click behavior, RS needs to collect users'
personal information and behavior to achieve a comprehensive and profound user
preference perception. However, these centrally collected data are
privacy-sensitive, and any leakage may cause severe problems to both users and
service providers. This paper proposed a novel privacy-preserved recommender
system framework (PPRSF), through the application of federated learning
paradigm, to enable the recommendation algorithm to be trained and carry out
inference without centrally collecting users' private data. The PPRSF not only
able to reduces the privacy leakage risk, satisfies legal and regulatory
requirements but also allows various recommendation algorithms to be applied
Differentially Private ERM Based on Data Perturbation
In this paper, after observing that different training data instances affect
the machine learning model to different extents, we attempt to improve the
performance of differentially private empirical risk minimization (DP-ERM) from
a new perspective. Specifically, we measure the contributions of various
training data instances on the final machine learning model, and select some of
them to add random noise. Considering that the key of our method is to measure
each data instance separately, we propose a new `Data perturbation' based (DB)
paradigm for DP-ERM: adding random noise to the original training data and
achieving ()-differential privacy on the final machine
learning model, along with the preservation on the original data. By
introducing the Influence Function (IF), we quantitatively measure the impact
of the training data on the final model. Theoretical and experimental results
show that our proposed DBDP-ERM paradigm enhances the model performance
significantly
Security and privacy for data mining of RFID-enabled product supply chains
The e-Pedigree used for verifying the authenticity of the products in RFID-enabled product supply chains plays a very important role in product anti-counterfeiting and risk management, but it is also vulnerable to malicious attacks and privacy leakage. While the radio frequency identification (RFID) technology bears merits such as automatic wireless identification without direct eye-sight contact, its security has been one of the main concerns in recent researches such as tag data tampering and cloning. Moreover, privacy leakage of the partners along the supply chains may lead to complete compromise of the whole system, and in consequence all authenticated products may be replaced by the faked ones! Quite different from other conventional databases, datasets in supply chain scenarios are temporally correlated, and every party of the system can only be semi-trusted. In this paper, a system that incorporates merits of both the secure multi-party computing and differential privacy is proposed to address the security and privacy issues, focusing on the vulnerability analysis of the data mining with distributed EPCIS datasets of e-pedigree having temporal relations from multiple range and aggregate queries in typical supply chain scenarios and the related algorithms. Theoretical analysis shows that our proposed system meets perfectly our preset design goals, while some of the other problems leave for future research
Classification with Partially Private Features
In this paper, we consider differentially private classification when some
features are sensitive, while the rest of the features and the label are not.
We adapt the definition of differential privacy naturally to this setting. Our
main contribution is a novel adaptation of AdaBoost that is not only provably
differentially private, but also significantly outperforms a natural benchmark
that assumes the entire data of the individual is sensitive in the experiments.
As a surprising observation, we show that boosting randomly generated
classifiers suffices to achieve high accuracy. Our approach easily adapts to
the classical setting where all the features are sensitive, providing an
alternate algorithm for differentially private linear classification with a
much simpler privacy proof and comparable or higher accuracy than
differentially private logistic regression on real-world datasets
Effective and Efficient Federated Tree Learning on Hybrid Data
Federated learning has emerged as a promising distributed learning paradigm
that facilitates collaborative learning among multiple parties without
transferring raw data. However, most existing federated learning studies focus
on either horizontal or vertical data settings, where the data of different
parties are assumed to be from the same feature or sample space. In practice, a
common scenario is the hybrid data setting, where data from different parties
may differ both in the features and samples. To address this, we propose
HybridTree, a novel federated learning approach that enables federated tree
learning on hybrid data. We observe the existence of consistent split rules in
trees. With the help of these split rules, we theoretically show that the
knowledge of parties can be incorporated into the lower layers of a tree. Based
on our theoretical analysis, we propose a layer-level solution that does not
need frequent communication traffic to train a tree. Our experiments
demonstrate that HybridTree can achieve comparable accuracy to the centralized
setting with low computational and communication overhead. HybridTree can
achieve up to 8 times speedup compared with the other baselines
Data Analytics with Differential Privacy
Differential privacy is the state-of-the-art definition for privacy,
guaranteeing that any analysis performed on a sensitive dataset leaks no
information about the individuals whose data are contained therein. In this
thesis, we develop differentially private algorithms to analyze distributed and
streaming data. In the distributed model, we consider the particular problem of
learning -- in a distributed fashion -- a global model of the data, that can
subsequently be used for arbitrary analyses. We build upon PrivBayes, a
differentially private method that approximates the high-dimensional
distribution of a centralized dataset as a product of low-order distributions,
utilizing a Bayesian Network model. We examine three novel approaches to
learning a global Bayesian Network from distributed data, while offering the
differential privacy guarantee to all local datasets. Our work includes a
detailed theoretical analysis of the distributed, differentially private
entropy estimator which we use in one of our algorithms, as well as a detailed
experimental evaluation, using both synthetic and real-world data. In the
streaming model, we focus on the problem of estimating the density of a stream
of users, which expresses the fraction of all users that actually appear in the
stream. We offer one of the strongest privacy guarantees for the streaming
model, user-level pan-privacy, which ensures that the privacy of any user is
protected, even against an adversary that observes the internal state of the
algorithm. We provide a detailed analysis of an existing, sampling-based
algorithm for the problem and propose two novel modifications that
significantly improve it, both theoretically and experimentally, by optimally
using all the allocated "privacy budget."Comment: Diploma Thesis, School of Electrical and Computer Engineering,
Technical University of Crete, Chania, Greece, 201