3 research outputs found
PPaaS: Privacy Preservation as a Service
Personally identifiable information (PII) can find its way into cyberspace
through various channels, and many potential sources can leak such information.
Data sharing (e.g. cross-agency data sharing) for machine learning and
analytics is one of the important components in data science. However, due to
privacy concerns, data should be enforced with strong privacy guarantees before
sharing. Different privacy-preserving approaches were developed for privacy
preserving data sharing; however, identifying the best privacy-preservation
approach for the privacy-preservation of a certain dataset is still a
challenge. Different parameters can influence the efficacy of the process, such
as the characteristics of the input dataset, the strength of the
privacy-preservation approach, and the expected level of utility of the
resulting dataset (on the corresponding data mining application such as
classification). This paper presents a framework named \underline{P}rivacy
\underline{P}reservation \underline{a}s \underline{a} \underline{S}ervice
(PPaaS) to reduce this complexity. The proposed method employs selective
privacy preservation via data perturbation and looks at different dynamics that
can influence the quality of the privacy preservation of a dataset. PPaaS
includes pools of data perturbation methods, and for each application and the
input dataset, PPaaS selects the most suitable data perturbation approach after
rigorous evaluation. It enhances the usability of privacy-preserving methods
within its pool; it is a generic platform that can be used to sanitize big data
in a granular, application-specific manner by employing a suitable combination
of diverse privacy-preserving algorithms to provide a proper balance between
privacy and utility
Privacy Preserving Distributed Machine Learning with Federated Learning
Edge computing and distributed machine learning have advanced to a level that
can revolutionize a particular organization. Distributed devices such as the
Internet of Things (IoT) often produce a large amount of data, eventually
resulting in big data that can be vital in uncovering hidden patterns, and
other insights in numerous fields such as healthcare, banking, and policing.
Data related to areas such as healthcare and banking can contain potentially
sensitive data that can become public if they are not appropriately sanitized.
Federated learning (FedML) is a recently developed distributed machine learning
(DML) approach that tries to preserve privacy by bringing the learning of an ML
model to data owners'. However, literature shows different attack methods such
as membership inference that exploit the vulnerabilities of ML models as well
as the coordinating servers to retrieve private data. Hence, FedML needs
additional measures to guarantee data privacy. Furthermore, big data often
requires more resources than available in a standard computer. This paper
addresses these issues by proposing a distributed perturbation algorithm named
as DISTPAB, for privacy preservation of horizontally partitioned data. DISTPAB
alleviates computational bottlenecks by distributing the task of privacy
preservation utilizing the asymmetry of resources of a distributed environment,
which can have resource-constrained devices as well as high-performance
computers. Experiments show that DISTPAB provides high accuracy, high
efficiency, high scalability, and high attack resistance. Further experiments
on privacy-preserving FedML show that DISTPAB is an excellent solution to stop
privacy leaks in DML while preserving high data utility
Efficient privacy preservation of big data for accurate data mining
Computing technologies pervade physical spaces and human lives, and produce a
vast amount of data that is available for analysis. However, there is a growing
concern that potentially sensitive data may become public if the collected data
are not appropriately sanitized before being released for investigation.
Although there are more than a few privacy-preserving methods available, they
are not efficient, scalable or have problems with data utility, and/or privacy.
This paper addresses these issues by proposing an efficient and scalable
nonreversible perturbation algorithm, PABIDOT, for privacy preservation of big
data via optimal geometric transformations. PABIDOT was tested for efficiency,
scalability, resistance, and accuracy using nine datasets and five
classification algorithms. Experiments show that PABIDOT excels in execution
speed, scalability, attack resistance and accuracy in large-scale
privacy-preserving data classification when compared with two other, related
privacy-preserving algorithms.Comment: Information Science