2,539 research outputs found
FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models
In a vertical federated learning (VFL) system consisting of a central server
and many distributed clients, the training data are vertically partitioned such
that different features are privately stored on different clients. The problem
of split VFL is to train a model split between the server and the clients. This
paper aims to address two major challenges in split VFL: 1) performance
degradation due to straggling clients during training; and 2) data and model
privacy leakage from clients' uploaded data embeddings. We propose FedVS to
simultaneously address these two challenges. The key idea of FedVS is to design
secret sharing schemes for the local data and models, such that
information-theoretical privacy against colluding clients and curious server is
guaranteed, and the aggregation of all clients' embeddings is reconstructed
losslessly, via decrypting computation shares from the non-straggling clients.
Extensive experiments on various types of VFL datasets (including tabular, CV,
and multi-view) demonstrate the universal advantages of FedVS in straggler
mitigation and privacy protection over baseline protocols.Comment: This paper was accepted to ICML 202
Recommended from our members
Preservation of Patient Level Privacy: Federated Classification and Calibration Models
With the launching of the Precision Medicine Initiative in the United States, by the National Institute of Health, and the emergence of a large volume of electronic health records, there are many opportunities to improve clinical decision support systems. A large number of samples are needed to build predictive models that have adequate discrimination and calibration. However, protecting patient privacy is also an important issue. Patient data are typically protected in localized silos, and consolidation of datasets from different healthcare systems is difficult. Federated learning allows the training of a global model by amassing intermediate calculations from localized medical systems. The knowledge learned from the data can be transferred and aggregated to achieve better performance than the one achieved by individual local models. Federated learning may help build better models, providing more accurate predictions. There are two types of measures to assess how well a model performs: discrimination and calibration. While most papers report discrimination measures, calibration has often been neglected but it is a critical metric for evaluation. In this dissertation, I show a novel way to build classifiers and calibration models in a federated manner. I also show how I can evaluate and improve model calibration in this manner. Federated modeling enables the accumulation of knowledge and information that are otherwise locked behind local medical systems
Functional encryption based approaches for practical privacy-preserving machine learning
Machine learning (ML) is increasingly being used in a wide variety of application domains. However, deploying ML solutions poses a significant challenge because of increasing privacy concerns, and requirements imposed by privacy-related regulations. To tackle serious privacy concerns in ML-based applications, significant recent research efforts have focused on developing privacy-preserving ML (PPML) approaches by integrating into ML pipeline existing anonymization mechanisms or emerging privacy protection approaches such as differential privacy, secure computation, and other architectural frameworks. While promising, existing secure computation based approaches, however, have significant computational efficiency issues and hence, are not practical.
In this dissertation, we address several challenges related to PPML and propose practical secure computation based approaches to solve them. We consider both two-tier cloud-based and three-tier hybrid cloud-edge based PPML architectures and address both emerging deep learning models and federated learning approaches. The proposed approaches enable us to outsource data or update a locally trained model in a privacy-preserving manner by employing computation over encrypted datasets or local models. Our proposed secure computation solutions are based on functional encryption (FE) techniques. Evaluation of the proposed approaches shows that they are efficient and more practical than existing approaches, and provide strong privacy guarantees. We also address issues related to the trustworthiness of various entities within the proposed PPML infrastructures. This includes a third-party authority (TPA) which plays a critical role in the proposed FE-based PPML solutions, and cloud service providers. To ensure that such entities can be trusted, we propose a transparency and accountability framework using blockchain. We show that the proposed transparency framework is effective and guarantees security properties. Experimental evaluation shows that the proposed framework is efficient
HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning
Federated learning has emerged as a promising approach for collaborative and
privacy-preserving learning. Participants in a federated learning process
cooperatively train a model by exchanging model parameters instead of the
actual training data, which they might want to keep private. However, parameter
interaction and the resulting model still might disclose information about the
training data used. To address these privacy concerns, several approaches have
been proposed based on differential privacy and secure multiparty computation
(SMC), among others. They often result in large communication overhead and slow
training time. In this paper, we propose HybridAlpha, an approach for
privacy-preserving federated learning employing an SMC protocol based on
functional encryption. This protocol is simple, efficient and resilient to
participants dropping out. We evaluate our approach regarding the training time
and data volume exchanged using a federated learning process to train a CNN on
the MNIST data set. Evaluation against existing crypto-based SMC solutions
shows that HybridAlpha can reduce the training time by 68% and data transfer
volume by 92% on average while providing the same model performance and privacy
guarantees as the existing solutions.Comment: 12 pages, AISec 201
A Privacy-Preserving Outsourced Data Model in Cloud Environment
Nowadays, more and more machine learning applications, such as medical
diagnosis, online fraud detection, email spam filtering, etc., services are
provided by cloud computing. The cloud service provider collects the data from
the various owners to train or classify the machine learning system in the
cloud environment. However, multiple data owners may not entirely rely on the
cloud platform that a third party engages. Therefore, data security and privacy
problems are among the critical hindrances to using machine learning tools,
particularly with multiple data owners. In addition, unauthorized entities can
detect the statistical input data and infer the machine learning model
parameters. Therefore, a privacy-preserving model is proposed, which protects
the privacy of the data without compromising machine learning efficiency. In
order to protect the data of data owners, the epsilon-differential privacy is
used, and fog nodes are used to address the problem of the lower bandwidth and
latency in this proposed scheme. The noise is produced by the
epsilon-differential mechanism, which is then added to the data. Moreover, the
noise is injected at the data owner site to protect the owners data. Fog nodes
collect the noise-added data from the data owners, then shift it to the cloud
platform for storage, computation, and performing the classification tasks
purposes
FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models
In a vertical federated learning (VFL) system consisting of a central server and many distributed clients, the training data are vertically partitioned such that different features are privately stored on different clients. The problem of split VFL is to train a model split between the server and the clients. This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’ uploaded data embeddings. We propose FedVS to simultaneously address these two challenges. The key idea of FedVS is to design secret sharing schemes for the local data and models, such that information-theoretical privacy against colluding clients and curious server is guaranteed, and the aggregation of all clients’ embeddings is reconstructed losslessly, via decrypting computation shares from the non- straggling clients. Extensive experiments on various types of VFL datasets (including tabular, CV, and multi-view) demonstrate the universal advantages of FedVS in straggler mitigation and privacy protection over baseline protocols
Vertical Federated Learning
Vertical Federated Learning (VFL) is a federated learning setting where
multiple parties with different features about the same set of users jointly
train machine learning models without exposing their raw data or model
parameters. Motivated by the rapid growth in VFL research and real-world
applications, we provide a comprehensive review of the concept and algorithms
of VFL, as well as current advances and challenges in various aspects,
including effectiveness, efficiency, and privacy. We provide an exhaustive
categorization for VFL settings and privacy-preserving protocols and
comprehensively analyze the privacy attacks and defense strategies for each
protocol. In the end, we propose a unified framework, termed VFLow, which
considers the VFL problem under communication, computation, privacy, and
effectiveness constraints. Finally, we review the most recent advances in
industrial applications, highlighting open challenges and future directions for
VFL
- …