45,940 research outputs found

    Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher

    Full text link
    While federated learning is promising for privacy-preserving collaborative learning without revealing local data, it remains vulnerable to white-box attacks and struggles to adapt to heterogeneous clients. Federated distillation (FD), built upon knowledge distillation--an effective technique for transferring knowledge from a teacher model to student models--emerges as an alternative paradigm, which provides enhanced privacy guarantees and addresses model heterogeneity. Nevertheless, challenges arise due to variations in local data distributions and the absence of a well-trained teacher model, which leads to misleading and ambiguous knowledge sharing that significantly degrades model performance. To address these issues, this paper proposes a selective knowledge sharing mechanism for FD, termed Selective-FD. It includes client-side selectors and a server-side selector to accurately and precisely identify knowledge from local and ensemble predictions, respectively. Empirical studies, backed by theoretical insights, demonstrate that our approach enhances the generalization capabilities of the FD framework and consistently outperforms baseline methods. This study presents a promising direction for effective knowledge transfer in privacy-preserving collaborative learning

    Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking

    Get PDF
    Deep neural networks have become a primary tool for solving problems in many fields. They are also used for addressing information retrieval problems and show strong performance in several tasks. Training these models requires large, representative datasets and for most IR tasks, such data contains sensitive information from users. Privacy and confidentiality concerns prevent many data owners from sharing the data, thus today the research community can only benefit from research on large-scale datasets in a limited manner. In this paper, we discuss privacy preserving mimic learning, i.e., using predictions from a privacy preserving trained model instead of labels from the original sensitive training data as a supervision signal. We present the results of preliminary experiments in which we apply the idea of mimic learning and privacy preserving mimic learning for the task of document re-ranking as one of the core IR tasks. This research is a step toward laying the ground for enabling researchers from data-rich environments to share knowledge learned from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval (Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa

    pFedES: Model Heterogeneous Personalized Federated Learning with Feature Extractor Sharing

    Full text link
    As a privacy-preserving collaborative machine learning paradigm, federated learning (FL) has attracted significant interest from academia and the industry alike. To allow each data owner (a.k.a., FL clients) to train a heterogeneous and personalized local model based on its local data distribution, system resources and requirements on model structure, the field of model-heterogeneous personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computation and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on feature Extractor Sharing (pFedES). It incorporates a small homogeneous feature extractor into each client's heterogeneous local model. Clients train them via the proposed iterative learning method to enable the exchange of global generalized knowledge and local personalized knowledge. The small local homogeneous extractors produced after local training are uploaded to the FL server and for aggregation to facilitate easy knowledge sharing among clients. We theoretically prove that pFedES can converge over wall-to-wall time. Extensive experiments on two real-world datasets against six state-of-the-art methods demonstrate that pFedES builds the most accurate model, while incurring low communication and computation costs. Compared with the best-performing baseline, it achieves 1.61% higher test accuracy, while reducing communication and computation costs by 99.6% and 82.9%, respectively.Comment: 12 pages, 10 figures. arXiv admin note: text overlap with arXiv:2310.1328

    Protecting privacy of users in brain-computer interface applications

    Get PDF
    Machine learning (ML) is revolutionizing research and industry. Many ML applications rely on the use of large amounts of personal data for training and inference. Among the most intimate exploited data sources is electroencephalogram (EEG) data, a kind of data that is so rich with information that application developers can easily gain knowledge beyond the professed scope from unprotected EEG signals, including passwords, ATM PINs, and other intimate data. The challenge we address is how to engage in meaningful ML with EEG data while protecting the privacy of users. Hence, we propose cryptographic protocols based on secure multiparty computation (SMC) to perform linear regression over EEG signals from many users in a fully privacy-preserving(PP) fashion, i.e., such that each individual's EEG signals are not revealed to anyone else. To illustrate the potential of our secure framework, we show how it allows estimating the drowsiness of drivers from their EEG signals as would be possible in the unencrypted case, and at a very reasonable computational cost. Our solution is the first application of commodity-based SMC to EEG data, as well as the largest documented experiment of secret sharing-based SMC in general, namely, with 15 players involved in all the computations

    Privacy-Friendly Collaboration for Cyber Threat Mitigation

    Full text link
    Sharing of security data across organizational boundaries has often been advocated as a promising way to enhance cyber threat mitigation. However, collaborative security faces a number of important challenges, including privacy, trust, and liability concerns with the potential disclosure of sensitive data. In this paper, we focus on data sharing for predictive blacklisting, i.e., forecasting attack sources based on past attack information. We propose a novel privacy-enhanced data sharing approach in which organizations estimate collaboration benefits without disclosing their datasets, organize into coalitions of allied organizations, and securely share data within these coalitions. We study how different partner selection strategies affect prediction accuracy by experimenting on a real-world dataset of 2 billion IP addresses and observe up to a 105% prediction improvement.Comment: This paper has been withdrawn as it has been superseded by arXiv:1502.0533

    Efficient and Privacy-Preserving Ride Sharing Organization for Transferable and Non-Transferable Services

    Full text link
    Ride-sharing allows multiple persons to share their trips together in one vehicle instead of using multiple vehicles. This can reduce the number of vehicles in the street, which consequently can reduce air pollution, traffic congestion and transportation cost. However, a ride-sharing organization requires passengers to report sensitive location information about their trips to a trip organizing server (TOS) which creates a serious privacy issue. In addition, existing ride-sharing schemes are non-flexible, i.e., they require a driver and a rider to have exactly the same trip to share a ride. Moreover, they are non-scalable, i.e., inefficient if applied to large geographic areas. In this paper, we propose two efficient privacy-preserving ride-sharing organization schemes for Non-transferable Ride-sharing Services (NRS) and Transferable Ride-sharing Services (TRS). In the NRS scheme, a rider can share a ride from its source to destination with only one driver whereas, in TRS scheme, a rider can transfer between multiple drivers while en route until he reaches his destination. In both schemes, the ride-sharing area is divided into a number of small geographic areas, called cells, and each cell has a unique identifier. Each driver/rider should encrypt his trip's data and send an encrypted ride-sharing offer/request to the TOS. In NRS scheme, Bloom filters are used to compactly represent the trip information before encryption. Then, the TOS can measure the similarity between the encrypted trips data to organize shared rides without revealing either the users' identities or the location information. In TRS scheme, drivers report their encrypted routes, an then the TOS builds an encrypted directed graph that is passed to a modified version of Dijkstra's shortest path algorithm to search for an optimal path of rides that can achieve a set of preferences defined by the riders
    • …
    corecore