1,603 research outputs found

    Vertical Federated Learning: A Structured Literature Review

    Full text link
    Federated Learning (FL) has emerged as a promising distributed learning paradigm with an added advantage of data privacy. With the growing interest in having collaboration among data owners, FL has gained significant attention of organizations. The idea of FL is to enable collaborating participants train machine learning (ML) models on decentralized data without breaching privacy. In simpler words, federated learning is the approach of ``bringing the model to the data, instead of bringing the data to the mode''. Federated learning, when applied to data which is partitioned vertically across participants, is able to build a complete ML model by combining local models trained only using the data with distinct features at the local sites. This architecture of FL is referred to as vertical federated learning (VFL), which differs from the conventional FL on horizontally partitioned data. As VFL is different from conventional FL, it comes with its own issues and challenges. In this paper, we present a structured literature review discussing the state-of-the-art approaches in VFL. Additionally, the literature review highlights the existing solutions to challenges in VFL and provides potential research directions in this domain

    Vertical Federated Learning:A Structured Literature Review

    Get PDF
    Federated Learning (FL) has emerged as a promising distributed learning paradigm with an added advantage of data privacy. With the growing interest in having collaboration among data owners, FL has gained significant attention of organizations. The idea of FL is to enable collaborating participants train machine learning (ML) models on decentralized data without breaching privacy. In simpler words, federated learning is the approach of ``bringing the model to the data, instead of bringing the data to the mode''. Federated learning, when applied to data which is partitioned vertically across participants, is able to build a complete ML model by combining local models trained only using the data with distinct features at the local sites. This architecture of FL is referred to as vertical federated learning (VFL), which differs from the conventional FL on horizontally partitioned data. As VFL is different from conventional FL, it comes with its own issues and challenges. In this paper, we present a structured literature review discussing the state-of-the-art approaches in VFL. Additionally, the literature review highlights the existing solutions to challenges in VFL and provides potential research directions in this domain

    Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

    Full text link
    The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Decomposable Naive Bayes Classifier for Partitioned Data

    Get PDF
    Most learning algorithms are designed to work on a single dataset. However, with the growth of networks, data is increasingly distributed over many databases in many different geographical sites. These databases cannot be moved to other network sites due to security, size, privacy, or data ownership consideration. In this paper, we propose two decomposable versions of Naive Bayes Classifier for horizontally and vertically partitioned data. The goal of our algorithms is to achieve the learning objectives for any data distribution encountered across the network by exchanging minimum local summaries among the participating sites

    Privacy Preservation and Analytical Utility of E-Learning Data Mashups in the Web of Data

    Get PDF
    Virtual learning environments contain valuable data about students that can be correlated and analyzed to optimize learning. Modern learning environments based on data mashups that collect and integrate data from multiple sources are relevant for learning analytics systems because they provide insights into students' learning. However, data sets involved in mashups may contain personal information of sensitive nature that raises legitimate privacy concerns. Average privacy preservation methods are based on preemptive approaches that limit the published data in a mashup based on access control and authentication schemes. Such limitations may reduce the analytical utility of the data exposed to gain students' learning insights. In order to reconcile utility and privacy preservation of published data, this research proposes a new data mashup protocol capable of merging and k-anonymizing data sets in cloud-based learning environments without jeopardizing the analytical utility of the information. The implementation of the protocol is based on linked data so that data sets involved in the mashups are semantically described, thereby enabling their combination with relevant educational data sources. The k-anonymized data sets returned by the protocol still retain essential information for supporting general data exploration and statistical analysis tasks. The analytical and empirical evaluation shows that the proposed protocol prevents individuals' sensitive information from re-identifying.The Spanish National Research Agency (AEI) funded this research through the project CREPES (ref. PID2020-115844RB-I00) with ERDF funds
    • …
    corecore