45,940 research outputs found
Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher
While federated learning is promising for privacy-preserving collaborative
learning without revealing local data, it remains vulnerable to white-box
attacks and struggles to adapt to heterogeneous clients. Federated distillation
(FD), built upon knowledge distillation--an effective technique for
transferring knowledge from a teacher model to student models--emerges as an
alternative paradigm, which provides enhanced privacy guarantees and addresses
model heterogeneity. Nevertheless, challenges arise due to variations in local
data distributions and the absence of a well-trained teacher model, which leads
to misleading and ambiguous knowledge sharing that significantly degrades model
performance. To address these issues, this paper proposes a selective knowledge
sharing mechanism for FD, termed Selective-FD. It includes client-side
selectors and a server-side selector to accurately and precisely identify
knowledge from local and ensemble predictions, respectively. Empirical studies,
backed by theoretical insights, demonstrate that our approach enhances the
generalization capabilities of the FD framework and consistently outperforms
baseline methods. This study presents a promising direction for effective
knowledge transfer in privacy-preserving collaborative learning
Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking
Deep neural networks have become a primary tool for solving problems in many
fields. They are also used for addressing information retrieval problems and
show strong performance in several tasks. Training these models requires large,
representative datasets and for most IR tasks, such data contains sensitive
information from users. Privacy and confidentiality concerns prevent many data
owners from sharing the data, thus today the research community can only
benefit from research on large-scale datasets in a limited manner. In this
paper, we discuss privacy preserving mimic learning, i.e., using predictions
from a privacy preserving trained model instead of labels from the original
sensitive training data as a supervision signal. We present the results of
preliminary experiments in which we apply the idea of mimic learning and
privacy preserving mimic learning for the task of document re-ranking as one of
the core IR tasks. This research is a step toward laying the ground for
enabling researchers from data-rich environments to share knowledge learned
from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval
(Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa
pFedES: Model Heterogeneous Personalized Federated Learning with Feature Extractor Sharing
As a privacy-preserving collaborative machine learning paradigm, federated
learning (FL) has attracted significant interest from academia and the industry
alike. To allow each data owner (a.k.a., FL clients) to train a heterogeneous
and personalized local model based on its local data distribution, system
resources and requirements on model structure, the field of model-heterogeneous
personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches
either rely on the availability of a public dataset with special
characteristics to facilitate knowledge transfer, incur high computation and
communication costs, or face potential model leakage risks. To address these
limitations, we propose a model-heterogeneous personalized Federated learning
approach based on feature Extractor Sharing (pFedES). It incorporates a small
homogeneous feature extractor into each client's heterogeneous local model.
Clients train them via the proposed iterative learning method to enable the
exchange of global generalized knowledge and local personalized knowledge. The
small local homogeneous extractors produced after local training are uploaded
to the FL server and for aggregation to facilitate easy knowledge sharing among
clients. We theoretically prove that pFedES can converge over wall-to-wall
time. Extensive experiments on two real-world datasets against six
state-of-the-art methods demonstrate that pFedES builds the most accurate
model, while incurring low communication and computation costs. Compared with
the best-performing baseline, it achieves 1.61% higher test accuracy, while
reducing communication and computation costs by 99.6% and 82.9%, respectively.Comment: 12 pages, 10 figures. arXiv admin note: text overlap with
arXiv:2310.1328
Protecting privacy of users in brain-computer interface applications
Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on secure multiparty computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving(PP) fashion, i.e., such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing-based SMC in general, namely, with 15 players involved in all the computations
Privacy-Friendly Collaboration for Cyber Threat Mitigation
Sharing of security data across organizational boundaries has often been
advocated as a promising way to enhance cyber threat mitigation. However,
collaborative security faces a number of important challenges, including
privacy, trust, and liability concerns with the potential disclosure of
sensitive data. In this paper, we focus on data sharing for predictive
blacklisting, i.e., forecasting attack sources based on past attack
information. We propose a novel privacy-enhanced data sharing approach in which
organizations estimate collaboration benefits without disclosing their
datasets, organize into coalitions of allied organizations, and securely share
data within these coalitions. We study how different partner selection
strategies affect prediction accuracy by experimenting on a real-world dataset
of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by
arXiv:1502.0533
Efficient and Privacy-Preserving Ride Sharing Organization for Transferable and Non-Transferable Services
Ride-sharing allows multiple persons to share their trips together in one
vehicle instead of using multiple vehicles. This can reduce the number of
vehicles in the street, which consequently can reduce air pollution, traffic
congestion and transportation cost. However, a ride-sharing organization
requires passengers to report sensitive location information about their trips
to a trip organizing server (TOS) which creates a serious privacy issue. In
addition, existing ride-sharing schemes are non-flexible, i.e., they require a
driver and a rider to have exactly the same trip to share a ride. Moreover,
they are non-scalable, i.e., inefficient if applied to large geographic areas.
In this paper, we propose two efficient privacy-preserving ride-sharing
organization schemes for Non-transferable Ride-sharing Services (NRS) and
Transferable Ride-sharing Services (TRS). In the NRS scheme, a rider can share
a ride from its source to destination with only one driver whereas, in TRS
scheme, a rider can transfer between multiple drivers while en route until he
reaches his destination. In both schemes, the ride-sharing area is divided into
a number of small geographic areas, called cells, and each cell has a unique
identifier. Each driver/rider should encrypt his trip's data and send an
encrypted ride-sharing offer/request to the TOS. In NRS scheme, Bloom filters
are used to compactly represent the trip information before encryption. Then,
the TOS can measure the similarity between the encrypted trips data to organize
shared rides without revealing either the users' identities or the location
information. In TRS scheme, drivers report their encrypted routes, an then the
TOS builds an encrypted directed graph that is passed to a modified version of
Dijkstra's shortest path algorithm to search for an optimal path of rides that
can achieve a set of preferences defined by the riders
- …