3 research outputs found

    PPaaS: Privacy Preservation as a Service

    Full text link
    Personally identifiable information (PII) can find its way into cyberspace through various channels, and many potential sources can leak such information. Data sharing (e.g. cross-agency data sharing) for machine learning and analytics is one of the important components in data science. However, due to privacy concerns, data should be enforced with strong privacy guarantees before sharing. Different privacy-preserving approaches were developed for privacy preserving data sharing; however, identifying the best privacy-preservation approach for the privacy-preservation of a certain dataset is still a challenge. Different parameters can influence the efficacy of the process, such as the characteristics of the input dataset, the strength of the privacy-preservation approach, and the expected level of utility of the resulting dataset (on the corresponding data mining application such as classification). This paper presents a framework named \underline{P}rivacy \underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice (PPaaS) to reduce this complexity. The proposed method employs selective privacy preservation via data perturbation and looks at different dynamics that can influence the quality of the privacy preservation of a dataset. PPaaS includes pools of data perturbation methods, and for each application and the input dataset, PPaaS selects the most suitable data perturbation approach after rigorous evaluation. It enhances the usability of privacy-preserving methods within its pool; it is a generic platform that can be used to sanitize big data in a granular, application-specific manner by employing a suitable combination of diverse privacy-preserving algorithms to provide a proper balance between privacy and utility

    Privacy Preserving Distributed Machine Learning with Federated Learning

    Full text link
    Edge computing and distributed machine learning have advanced to a level that can revolutionize a particular organization. Distributed devices such as the Internet of Things (IoT) often produce a large amount of data, eventually resulting in big data that can be vital in uncovering hidden patterns, and other insights in numerous fields such as healthcare, banking, and policing. Data related to areas such as healthcare and banking can contain potentially sensitive data that can become public if they are not appropriately sanitized. Federated learning (FedML) is a recently developed distributed machine learning (DML) approach that tries to preserve privacy by bringing the learning of an ML model to data owners'. However, literature shows different attack methods such as membership inference that exploit the vulnerabilities of ML models as well as the coordinating servers to retrieve private data. Hence, FedML needs additional measures to guarantee data privacy. Furthermore, big data often requires more resources than available in a standard computer. This paper addresses these issues by proposing a distributed perturbation algorithm named as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB alleviates computational bottlenecks by distributing the task of privacy preservation utilizing the asymmetry of resources of a distributed environment, which can have resource-constrained devices as well as high-performance computers. Experiments show that DISTPAB provides high accuracy, high efficiency, high scalability, and high attack resistance. Further experiments on privacy-preserving FedML show that DISTPAB is an excellent solution to stop privacy leaks in DML while preserving high data utility

    Efficient privacy preservation of big data for accurate data mining

    Full text link
    Computing technologies pervade physical spaces and human lives, and produce a vast amount of data that is available for analysis. However, there is a growing concern that potentially sensitive data may become public if the collected data are not appropriately sanitized before being released for investigation. Although there are more than a few privacy-preserving methods available, they are not efficient, scalable or have problems with data utility, and/or privacy. This paper addresses these issues by proposing an efficient and scalable nonreversible perturbation algorithm, PABIDOT, for privacy preservation of big data via optimal geometric transformations. PABIDOT was tested for efficiency, scalability, resistance, and accuracy using nine datasets and five classification algorithms. Experiments show that PABIDOT excels in execution speed, scalability, attack resistance and accuracy in large-scale privacy-preserving data classification when compared with two other, related privacy-preserving algorithms.Comment: Information Science
    corecore