8,122 research outputs found
Privacy Preserving Distributed Data Mining
Privacy preserving distributed data mining aims to design secure protocols which allow multiple parties to conduct collaborative data mining while protecting the data privacy. My research focuses on the design and implementation of privacy preserving two-party protocols based on homomorphic encryption. I present new results in this area, including new secure protocols for basic operations and two fundamental privacy preserving data mining protocols.
I propose a number of secure protocols for basic operations in the additive secret-sharing scheme based on homomorphic encryption. I derive a basic relationship between a secret number and its shares, with which we develop efficient secure comparison and secure division with public divisor protocols. I also design a secure inverse square root protocol based on Newton\u27s iterative method and hence propose a solution for the secure square root problem. In addition, we propose a secure exponential protocol based on Taylor series expansions. All these protocols are implemented using secure multiplication and can be used to develop privacy preserving distributed data mining protocols.
In particular, I develop efficient privacy preserving protocols for two fundamental data mining tasks: multiple linear regression and EM clustering. Both protocols work for arbitrarily partitioned datasets. The two-party privacy preserving linear regression protocol is provably secure in the semi-honest model, and the EM clustering protocol discloses only the number of iterations. I provide a proof-of-concept implementation of these protocols in C++, based on the Paillier cryptosystem
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
Privacy-enhancing Aggregation of Internet of Things Data via Sensors Grouping
Big data collection practices using Internet of Things (IoT) pervasive
technologies are often privacy-intrusive and result in surveillance, profiling,
and discriminatory actions over citizens that in turn undermine the
participation of citizens to the development of sustainable smart cities.
Nevertheless, real-time data analytics and aggregate information from IoT
devices open up tremendous opportunities for managing smart city
infrastructures. The privacy-enhancing aggregation of distributed sensor data,
such as residential energy consumption or traffic information, is the research
focus of this paper. Citizens have the option to choose their privacy level by
reducing the quality of the shared data at a cost of a lower accuracy in data
analytics services. A baseline scenario is considered in which IoT sensor data
are shared directly with an untrustworthy central aggregator. A grouping
mechanism is introduced that improves privacy by sharing data aggregated first
at a group level compared as opposed to sharing data directly to the central
aggregator. Group-level aggregation obfuscates sensor data of individuals, in a
similar fashion as differential privacy and homomorphic encryption schemes,
thus inference of privacy-sensitive information from single sensors becomes
computationally harder compared to the baseline scenario. The proposed system
is evaluated using real-world data from two smart city pilot projects. Privacy
under grouping increases, while preserving the accuracy of the baseline
scenario. Intra-group influences of privacy by one group member on the other
ones are measured and fairness on privacy is found to be maximized between
group members with similar privacy choices. Several grouping strategies are
compared. Grouping by proximity of privacy choices provides the highest privacy
gains. The implications of the strategy on the design of incentives mechanisms
are discussed
Towards trajectory anonymization: a generalization-based approach
Trajectory datasets are becoming popular due to the massive usage of GPS and locationbased services. In this paper, we address privacy issues regarding the identification of individuals in static trajectory datasets. We first adopt the notion of k-anonymity to trajectories and propose a novel generalization-based approach for anonymization of trajectories. We further show that releasing
anonymized trajectories may still have some privacy leaks. Therefore we propose a randomization based reconstruction algorithm for releasing anonymized trajectory data and also present how the underlying techniques can be adapted to other anonymity standards. The experimental results on real and synthetic trajectory datasets show the effectiveness of the proposed techniques
- …