Search CORE

112 research outputs found

Re-Identification Attacks – A Systematic Literature Review

Author: Henriksen-Bulmer Jane
Jeary Sherry
Publication venue: 'Elsevier BV'
Publication date: 01/12/2016
Field of study

The publication of increasing amounts of anonymised open source data has resulted in a worryingly rising number of successful re-identification attacks. This has a number of privacy and security implications both on an individual and corporate level. This paper uses a Systematic Literature Review to investigate the depth and extent of this problem as reported in peer reviewed literature. Using a detailed protocol ,seven research portals were explored, 10,873 database entries were searched, from which a subset of 220 papers were selected for further review. From this total, 55 papers were selected as being within scope and to be included in the final review. The main review findings are that 72.7% of all successful re-identification attacks have taken place since 2009. Most attacks use multiple datasets. The majority of them have taken place on global datasets such as social networking data, and have been conducted by US based researchers. Furthermore, the number of datasets can be used as an attribute. Because privacy breaches have security, policy and legal implications (e.g. data protection, Safe Harbor etc.), the work highlights the need for new and improved anonymisation techniques or indeed, a fresh approach to open source publishing

Crossref

Bournemouth University Research Online

Mining Frequent Graph Patterns with Differential Privacy

Author: Geweke J.
Gilks W.
Karwa V.
Rubinstein R.
Williams O.
Yan X.
Publication venue
Publication date: 01/03/2013
Field of study

Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. {\em Differential privacy} has recently emerged as the {\em de facto} standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent {\em itemsets} cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a non-private mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision

arXiv.org e-Print Archive

CiteSeerX

Crossref

GLOVE: towards privacy-preserving publishing of record-level-truthful mobile phone trajectories

Author: Fiore Marco
Furno Angelo
Gramaglia Marco
Stanica Razvan
Publication venue: 'IOP Publishing'
Publication date: 01/01/2021
Field of study

Datasets of mobile phone trajectories collected by network operators offer an unprecedented opportunity to discover new knowledge from the activity of large populations of millions. However, publishing such trajectories also raises significant privacy concerns, as they contain personal data in the form of individual movement patterns. Privacy risks induce network operators to enforce restrictive confidential agreements in the rare occasions when they grant access to collected trajectories, whereas a less involved circulation of these data would fuel research and enable reproducibility in many disciplines. In this work, we contribute a building block toward the design of privacy-preserving datasets of mobile phone trajectories that are truthful at the record level. We present GLOVE, an algorithm that implements k-anonymity, hence solving the crucial unicity problem that affects this type of data while ensuring that the anonymized trajectories correspond to real-life users. GLOVE builds on original insights about the root causes behind the undesirable unicity of mobile phone trajectories, and leverages generalization and suppression to remove them. Proof-of-concept validations with large-scale real-world datasets demonstrate that the approach adopted by GLOVE allows preserving a substantial level of accuracy in the data, higher than that granted by previous methodologies.This work was supported by the Atracción de Talento Investigador program of the Comunidad de Madrid under Grant No. 2019-T1/TIC-16037 NetSense

INRIA a CCSD electronic archive server

HAL Descartes

Universidad Carlos III de Madrid e-Archivo

CASTLEGUARD : anonymised data streams with guaranteed differential privacy

Author: Adams J.
Aggarwal MM
Ahammed Z.
Amonett J.
Anderson BD
Arkhipkin D.
Averichev GS
Bai Y.
Balewski J.
Barannikova O.
Barnby LS
Baudot J.
Bekele S.
Belaga VV
Bellwied R.
Berger J.
Bezverkhny BI
Bharadwaj S.
Bhatia VS
Bichsel H.
Billmeier A.
Bland LC
Blyth CO
Bonner BE
Botje M.
Boucham A.
Brandin A.
Bravar A.
Bystersky M.
Cadman RV
Cai XZ
Caines H.
Carroll J.
Castillo J.
Cebra D.
Chajecki Z.
Chaloupka P.
Chattopdhyay S.
Chen HF
Chen Y.
Cheng J.
Cherney M.
Chikanian A.
Christie W.
Coffin JP
Cormier TM
Cramer JG
Crawford HJ
Das D.
Das S.
De Moura MM
de Toledo AS
Derevschikov AA
Didenko L.
Dietel T.
Dong WJ
Dong X.
Draper JE
Du F.
Dubey AK
Dunin VB
Dunlop JC
Eckardt V.
Edwards WR
Efimov LG
Emelianov V.
Engelage J.
Eppley G.
Erazmus B.
Estienne M.
Fachini P.
Faivre J.
Fatemi R.
Fedorisin J.
Filimonov K.
Filip P.
Finch E.
Fine V.
Fisyak Y.
Foley KJ
Fomenko K.
Fu J.
Gagliardi Carl A.
Gans J.
Ganti MS
Gaudichet L.
Geurts F.
Ghazikhanian V.
Ghosh P.
Gonzalez JE
Grachov O.
Grebenyuk O.
Grosnick D.
Guertin SM
Guo Y.
Gupta A.
Gutierrez TD
Hallman TJ
Hamed A.
Hardtke D.
Harris JW
Heinz M.
Henry TW
Hepplemann S.
Hiort E.
Hippolyte B.
Hirsch A.
Hoffmann GW
Huang HZ
Huang SL
Hughes EW
Humanic TJ
Igo G.
Ishihara A.
Jacobs P.
Jacobs WW
Janik M.
Jiang H.
Jones PG
Judd EG
Kabana S.
Kang K.
Kaplan M.
Keane D.
Khodyrev VY
Kiryluk J.
Kisiel A.
Kislov EM
Klay J.
Klein SR
Klyachko A.
Koetke DD
Kollegger T.
Kopytine M.
Kotchenda L.
Kramer M.
Kravtsov P.
Kravtsov VI
Krueger K.
Kuhn C.
Kulikov AI
Kumar A.
Kunz CL
Kutuev RK
Kuznetsov AA
Lamont MAC
Landgraf JM
Lane F.
Lange S.
Lauret J.
Lebedev A.
Lednicky R.
Lehocka S.
LeVine MJ
Li C.
Li Q.
Li Y.
Lindenbaum SJ
Lisa MA
Liu F.
Liu L.
Liu QJ
Liu Z.
Ljubicic T.
Llope WJ
Long H.
Longacre RS
Lopez-Noriega M.
Love WA
Lu Y.
Ludlam T.
Lynn D.
Ma GL
Ma JG
Ma YG
Magestro D.
Mahajan S.
Mahapatra DP
Majka R.
Mangotra LK
Manweiler R.
Margetis S.
Markert C.
Martin L.
Marx JN
Matis HS
Matulenko YA
Mazumdar MRD
McClain CJ
McShane TS
Meissner F.
Melnick Y.
Meschanin A.
Miller ML
Milosevich Z.
Minaev NG
Mironov C.
Mischke A.
Mishra D.
Mitchell J.
Mohanty B.
Molnar L.
Moore CF
Mora-Corral MJ
Morozov DA
Morozov V.
Munhoz MG
Nandi BK
Nayak TK
Nelson JM
Netrakanti PK
Nikitin VA
Nogach LV
Norman B.
Nurushev SB
Odyniec G.
Ogawa A.
Okorokov V.
Oldenburg M.
Olson D.
Pal SK
Panebratsev Y.
Panitkin SY
Pavlinov AI
Pawlak T.
Peitzmann T.
Perevoztchikov V.
Perkins C.
Peryt W.
Petrov VA
Phatak SC
Picha R.
Planinic M.
Pluta J.
Porile N.
Porter J.
Poskanzer AM
Potekhin M.
Potrebenikova E.
Potukuchi BVKS
Prindle D.
Pruneau C.
Putschke J.
Rai G.
Rakness G.
Raniwala R.
Raniwala S.
Ravel O.
Ray RL
Razin SV
Reichhold D.
Reid JG
Renault G.
Retiere F.
Ridiger A.
Ritter HG
Roberts JB
Rogachevskiy OV
Romero JL
Rose A.
Roy C.
Ruan L.
Sakrejda I.
Salur S.
Sanchez MCD
Sandweiss J.
Savin I.
Sazhin PS
Schambach J.
Scharenberg RP
Schmitz N.
Schroeder LS
Schweda K.
Seger J.
Seyboth P.
Shahaliev E.
Shao M.
Shao W.
Sharma M.
Shen WQ
Shestermanov KE
Shimanskiy SS
Simon F.
Singaraju RN
Skoro G.
Smirnov N.
Snellings R.
Sood G.
Sorensen P.
Sowinski J.
Speltz J.
Spinka H. M.
Srivastava B.
Stadnik A.
Stanislaus TDS
Stock R.
Stolpovsky A.
Strikhanov M.
Stringfellow B.
Suaide AAP
Sugarbaker E.
Suire C.
Sumbera M.
Surrow B.
Symons TJM
Szarwas P.
Tai A.
Takahashi J.
Tang AH
Tarnowsky T.
Thein D.
Thomas JH
Timoshenko S.
Tokarev M.
Trainor TA
Trentalange S.
Tribble Robert E.
Tsai O.
Ulery J.
Ullrich T.
Underwood DG
Urkinbaev A.
Van Buren G.
van Leeuwen M.
Vander Molen AM
Varma R.
Vasilevski IM
Vasiliev AN
Vernet R.
Vigdor SE
Viyogi VP
Vokal S.
Voloshin SA
Vznuzdaev M.
Waggoner B.
Wang F.
Wang G.
Wang G.
Wang XL
Wang Y.
Wang Y.
Wang ZM
Ward H.
Watson JW
Webb JC
Wells R.
Westfall GD
Wetzler A.
Whitten C.
Wieman H.
Wissink SW
Witt R.
Wood J.
Wu J.
Xu N.
Xu Z.
Xu ZZ
Yamamoto E.
Yepes P.
Yurevich VI
Zanevsky YV
Zhang H.
Zhang WM
Zhang ZP
Zolnierezuk PA
Zoulkarneev R.
Zoulkarneeva Y.
Zubarev AN.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

Data streams are commonly used by data controllers to outsource the processing of real-time data to third-party data processors. Data protection legislation and best practice in data management support the view that data controllers are responsible for providing a guarantee of privacy for user data contained within published data streams. Continuously Anonymising STreaming data via adaptive cLustEring (CASTLE) is an established method for anonymising data streams with a guarantee of k-anonymity. However, k-anonymity has been shown to be a weak privacy guarantee that has vulnerabilities in practical applications. In this paper we propose Continuously Anonymising STreaming data via adaptive cLustEring with GUAR-anteed Differential privacy (CASTLEGUARD), a data stream anonymisation algorithm that provides a reliable guarantee of k-anonymity, l-diversity and differential privacy to data subjects. We analyse CASTLEGUARD to show that, through safe k-anonymisation and β-sampling, the proposed approach satisfies differentially private k-anonymity. Further, we demonstrate the efficacy of the approach in the context of machine learning, presenting experimental analysis to demonstrate that it can be used to protect the individual privacy of users whilst maintaining the utility of a data stream

Warwick Research Archives Portal Repository

Dspace at IIT Bombay

Distribution-Agnostic Database De-Anonymization Under Synchronization Errors

Author: Bakirtas Serhat
Erkip Elza
Publication venue
Publication date: 25/09/2023
Field of study

There has recently been an increased scientific interest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strategies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoretical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection algorithm to accommodate for the unknown underlying distribution, proposing a new seeded deletion detection algorithm, and employing statistical and information-theoretic tools, we derive sufficient conditions on the database growth rate for successful matching. Our findings demonstrate that a double-logarithmic seed size relative to row size ensures successful deletion detection. More importantly, we show that the derived sufficient conditions are the same as in the distribution-aware setting, negating any asymptotic loss of performance due to unknown underlying distributions

arXiv.org e-Print Archive

Fast Differentially Private Matrix Factorization

Author: Ahn S.
Chen T.
Ding N.
Hartstein A.
Keshavan R.
Kyrola A.
Marsaglia G.
Meka R.
Mir D. J.
Neal R. M.
Niu F.
Sato I.
Srebro N.
Wang Y.-X.
Wang Y.-X.
Welling M.
Xin Y.
Zhao H.
Publication venue
Publication date: 07/05/2015
Field of study

Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient implementation. By careful systems design and by exploiting the power law behavior of the data to maximize CPU cache bandwidth we are able to generate 1024 dimensional models at a rate of 8.5 million recommendations per second on a single PC

arXiv.org e-Print Archive

Crossref