1,177 research outputs found
Anonymization of Sensitive Quasi-Identifiers for l-diversity and t-closeness
A number of studies on privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but are not sensitive attributes and that a disease name is a sensitive attribute but is not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs in practice. In this paper, we refer to these attributes as sensitive QIDs and we propose novel privacy models, namely, (l1, ..., lq)-diversity and (t1, ..., tq)-closeness, and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data analyzers, can be conducted according to each data analyzer’s objective. Our proposed method was experimentally evaluated using real data sets
Privacy-Preserving Trajectory Data Publishing via Differential Privacy
Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy.
The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size
Anonymizing Periodical Releases of SRS Data by Fusing Differential Privacy
Spontaneous reporting systems (SRS) have been developed to collect adverse
event records that contain personal demographics and sensitive information like
drug indications and adverse reactions. The release of SRS data may disclose
the privacy of the data provider. Unlike other microdata, very few
anonymyization methods have been proposed to protect individual privacy while
publishing SRS data. MS(k, {\theta}*)-bounding is the first privacy model for
SRS data that considers multiple individual records, mutli-valued sensitive
attributes, and rare events. PPMS(k, {\theta}*)-bounding then is proposed for
solving cross-release attacks caused by the follow-up cases in the periodical
SRS releasing scenario. A recent trend of microdata anonymization combines the
traditional syntactic model and differential privacy, fusing the advantages of
both models to yield a better privacy protection method. This paper proposes
the PPMS-DP(k, {\theta}*, {\epsilon}) framework, an enhancement of PPMS(k,
{\theta}*)-bounding that embraces differential privacy to improve privacy
protection of periodically released SRS data. We propose two anonymization
algorithms conforming to the PPMS-DP(k, {\theta}*, {\epsilon}) framework,
PPMS-DPnum and PPMS-DPall. Experimental results on the FAERS datasets show that
both PPMS-DPnum and PPMS-DPall provide significantly better privacy protection
than PPMS-(k, {\theta}*)-bounding without sacrificing data distortion and data
utility.Comment: 10 pages, 11 figure
Marginal Release Under Local Differential Privacy
Many analysis and machine learning tasks require the availability of marginal
statistics on multidimensional datasets while providing strong privacy
guarantees for the data subjects. Applications for these statistics range from
finding correlations in the data to fitting sophisticated prediction models. In
this paper, we provide a set of algorithms for materializing marginal
statistics under the strong model of local differential privacy. We prove the
first tight theoretical bounds on the accuracy of marginals compiled under each
approach, perform empirical evaluation to confirm these bounds, and evaluate
them for tasks such as modeling and correlation testing. Our results show that
releasing information based on (local) Fourier transformations of the input is
preferable to alternatives based directly on (local) marginals
- …