Search CORE

10 research outputs found

Reframing in Frequent Pattern Mining

Author: Ahmed Chowdhury Farhan
Flach Peter A
Kull Meelis
Lachiche Nicolas
Samiullah Md.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2016
Field of study

Crossref

Explore Bristol Research

Top-k frequent itemsets via differentially private FP-trees

Author: Agrawal R.
Hardt M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Privacy-preserving Frequent Itemset Mining for Sparse and Dense Data

Author: Alisa Pankova
Peeter Laud
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 05/07/2015
Field of study

Frequent itemset mining is a task that can in turn be used for other purposes such as associative rule mining. One problem is that the data may be sensitive, and its owner may refuse to give it for analysis in plaintext. There exist many privacy-preserving solutions for frequent itemset mining, but in any case enhancing the privacy inevitably spoils the efficiency. Leaking some less sensitive information such as data density might improve the efficiency. In this paper, we devise an approach that works better for sparse matrices and compare it to the related work that uses similar security requirements on similar secure multiparty computation platform

CiteSeerX

Cryptology ePrint Archive

PrivSyn: Differentially Private Data Synthesis

Author: Backes Michael
Chen Jiming
He Shibo
Honorio Jean
Li Ninghui
Wang Tianhao
Zhang Yang
Zhang Zhikun
Publication venue
Publication date: 30/12/2020
Field of study

In differential privacy (DP), a challenging problem is to generate synthetic datasets that efficiently capture the useful information in the private data. The synthetic dataset enables any task to be done without privacy concern and modification to existing algorithms. In this paper, we present PrivSyn, the first automatic synthetic data generation method that can handle general tabular datasets (with 100 attributes and domain size

>2^{500}

). PrivSyn is composed of a new method to automatically and privately identify correlations in the data, and a novel method to generate sample data from a dense graphic model. We extensively evaluate different methods on multiple datasets to demonstrate the performance of our method

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Differentially private data publishing for data analysis

Author: Su Dong
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential mechanism with a novel quality function, which maximizes the expected number of correctly classified records by a histogram classifier. PrivPfC supports both the binary classification as well as the multiclass classification. Secondly, we study the problem of differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of the DPLloyd algorithm which is a differentially private version of the Lloyd algorithm and propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. We also propose a hybrid approach that combines the advantages of the improved version of DPLloyd and EUGkM. Finally, we investigate the sparse vector technique (SVT) which is a fundamental technique for satisfying differential privacy in answering a sequence of queries. We propose a new version of SVT that provides better utility by introducing an effective technique to improve the performance of SVT in the interactive setting. We also show that in the non-interactive setting (but not the interactive setting), usage of SVT can be replaced by the exponential mechanism

Purdue E-Pubs