1,603 research outputs found

    Differentially Private Data Analytics

    Get PDF
    With the emergence of smart devices and data-driven applications, personal data are being dramatically generated, gathered and used by modern systems for data analytics in a wide range of customised service applications. Despite the advantages of data analytics, potential risks arise that allow adversaries to infer individualsโ€™ private information by using some auxiliary information. Therefore, it is crucial to develop new methods and techniques for privacy-preserving data analytics that ensure acceptable trade-offs between privacy and utility. Over the last decade, differential privacy (DP) has been viewed as a promising notion of privacy because it mathematically bounds the trade-off between privacy and utility against adversariesโ€™ strong inference attacks (nโˆ’1 out of n items in the input). By exploring the latest results of differentially private data analytics, this thesis concentrates on four sub-topics: differentially private data aggregation with security guarantees, privacy budget guarantees in distributed systems, differentially private single-path publishing and differentially private k-means clustering. For differentially private data aggregation with security guarantees, we propose a twolayer data aggregation approach against semi-honest but colluding data aggregators, where DP noise is randomly injected. We address the problems of the collusion of data curators and data aggregators. The key idea of our approach is injecting DP noise randomly to prevent privacy disclosure from collusion, while maintaining a high degree of utility and splitting and sharing data pieces to guarantee security. The experimental evaluations over synthetic datasets confirm our mathematical analysis results and show that our approach achieves enhanced aggregation utility. For privacy budget guarantees in distributed systems, we study the parallel composition of privacy budget in differentially private data aggregation scenarios. We propose a new lower bound of the parallel composition of privacy budgets. We address two problems of the state-of-the-art: poor utility when using global sensitivity for both data curators and data aggregators and unknown privacy guarantees with conditions on the local sensitivities between data curators and data aggregators. The key is taking a property of the summation function: the local sensitivity of summation for a dataset is equal to the maximum summation of any sub-dataset. The experimental results over a real-life dataset support our theoretical results of the proposed lower bound of the unconditional parallel composition of privacy budget. For differentially private single-path publishing, we propose a graph-based single-path publishing approach with DP guarantees. We address two problems in existing work: information loss regarding the exact path for genuine users and no privacy guarantees for edges when adversaries have information about all vertices, yet one edge is missing. The main idea is adding fake vertices and edges into the real path by DP, and then hiding connections in the perturbed path in the topology of the published graph so that only the trusted path users with full knowledge about the map can recover the real path. The experimental evaluations of synthetic datasets corroborate the theoretical properties of our approach. For differentially private k-means clustering, we propose a convergent differentially private k-means algorithm that addresses the non-convergence problem of existing work. The key idea is that, at each iteration, we sample a centroid for the next iteration from a specially defined area in each cluster with a selected orientation to guarantee both convergence and convergence rates. Such an orientation is determined by either past centroid movements or past plus future centroid movements. Both mathematical and experimental evaluations show that, because of the convergence, our approach achieves enhanced clustering quality over the state-of-the-art of DP k-means clustering, while having the same DP guarantees.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    Private Graphon Estimation for Sparse Graphs

    Get PDF
    We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph GG, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If GG is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon WW, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to WW in an appropriate version of the L2L_2 norm. In particular, this means we can estimate the sizes of all multi-way cuts in GG. Our results hold as long as WW is bounded, the average degree of GG grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results.Comment: 36 page

    Revealing Network Structure, Confidentially: Improved Rates for Node-Private Graphon Estimation

    Full text link
    Motivated by growing concerns over ensuring privacy on social networks, we develop new algorithms and impossibility results for fitting complex statistical models to network data subject to rigorous privacy guarantees. We consider the so-called node-differentially private algorithms, which compute information about a graph or network while provably revealing almost no information about the presence or absence of a particular node in the graph. We provide new algorithms for node-differentially private estimation for a popular and expressive family of network models: stochastic block models and their generalization, graphons. Our algorithms improve on prior work, reducing their error quadratically and matching, in many regimes, the optimal nonprivate algorithm. We also show that for the simplest random graph models (G(n,p)G(n,p) and G(n,m)G(n,m)), node-private algorithms can be qualitatively more accurate than for more complex models---converging at a rate of 1ฯต2n3\frac{1}{\epsilon^2 n^{3}} instead of 1ฯต2n2\frac{1}{\epsilon^2 n^2}. This result uses a new extension lemma for differentially private algorithms that we hope will be broadly useful

    Privacy-Preserving Distributed Processing Over Networks

    Get PDF

    ๋ฏผ๊ฐํ•œ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ์ˆ  ๊ฐœ๋ฐœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2022. 8. ์ด์žฌ์šฑ.์ตœ๊ทผ ์ธ๊ณต์ง€๋Šฅ์˜ ์„ฑ๊ณต์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์š”์ธ์ด ์žˆ์œผ๋‚˜, ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ๋ฐœ๊ณผ ์ •์ œ๋œ ๋ฐ์ดํ„ฐ ์–‘์˜ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์ธ ์ฆ๊ฐ€๋กœ ์ธํ•œ ์˜ํ–ฅ์ด ํฌ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ๋Š” ์‹ค์žฌ์  ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜๋ฉฐ, ํ˜„์‹ค ์„ธ๊ณ„์—์„œ ๊ฐœ์ธ ๋˜๋Š” ๊ธฐ์—…์€ ํ•™์Šต๋œ ๋ชจ๋ธ ๋˜๋Š” ํ•™์Šต์— ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์ด์ต์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋ฐ์ดํ„ฐ ๋˜๋Š” ๋ชจ๋ธ์˜ ๊ณต์œ ๋Š” ๊ฐœ์ธ์˜ ๋ฏผ๊ฐ ์ •๋ณด๋ฅผ ์œ ์ถœํ•จ์œผ๋กœ์จ ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ์นจํ•ด๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๋ฐํ˜€์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ๋Š” ๋ฏผ๊ฐ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ตœ๊ทผ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ์ˆ , ์ฆ‰ ๋™ํ˜• ์•”ํ˜ธ์™€ ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋จผ์ €, ๋™ํ˜• ์•”ํ˜ธ๋Š” ์•”ํ˜ธํ™”๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ํ™œ์šฉํ•œ ์—ฐ์‚ฐ์€ ๊ธฐ์กด์˜ ์—ฐ์‚ฐ์— ๋น„ํ•ด ๋งค์šฐ ํฐ ์—ฐ์‚ฐ ์‹œ๊ฐ„์„ ์š”๊ตฌํ•˜๋ฏ€๋กœ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ํ•™์Šต ๋‹จ๊ณ„์—์„œ์˜ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋ถ€ํ„ฐ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ์ ์šฉํ•˜๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ํ•จ๊ป˜ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ๋งŒ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ๋ฒ”์œ„๊ฐ€ ๋„“์–ด์ง€์ง€๋งŒ, ๊ทธ๋งŒํผ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋Š˜์–ด๋‚œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ผ๋ถ€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ์„ ์•”ํ˜ธํ™”ํ•จ์œผ๋กœ์จ ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ผ๋ถ€ ๋ฏผ๊ฐ ๋ณ€์ˆ˜๊ฐ€ ์•”ํ˜ธํ™”๋˜์–ด ์žˆ์„ ๋•Œ ์—ฐ์‚ฐ๋Ÿ‰์„ ๋งค์šฐ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฆฟ์ง€ ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋˜ํ•œ ๊ฐœ๋ฐœ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™•์žฅ์‹œ์ผœ ๋™ํ˜• ์•”ํ˜ธ ์นœํ™”์ ์ด์ง€ ์•Š์€ ํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๊ณผ์ •์„ ์ตœ๋Œ€ํ•œ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•จ๊ป˜ ์ œ์•ˆํ•œ๋‹ค. ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์œ„ํ•œ ๋‘ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์€ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ๊ธฐ๊ณ„ํ•™์Šต์˜ ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œํ—˜ ๋ฐ์ดํ„ฐ์˜ ์ง์ ‘์ ์ธ ๋…ธ์ถœ์„ ๋ง‰์„ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„œํฌํŠธ ๋ฒกํ„ฐ ๊ตฐ์ง‘ํ™” ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋™ํ˜• ์•”ํ˜ธ ์นœํ™”์  ์ถ”๋ก  ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋™ํ˜• ์•”ํ˜ธ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์œ„ํ˜‘์— ๋Œ€ํ•ด์„œ ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถ”๋ก  ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•  ๋•Œ ์ถ”๋ก  ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๊ณต๊ฒฉ์ž๊ฐ€ ์ž์‹ ์ด ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์™€ ๊ทธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถ”๋ก  ๊ฒฐ๊ณผ๋งŒ์„ ์ด์šฉํ•˜์—ฌ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ๊ณผ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ์ด ๋ฐํ˜€์ง€๊ณ  ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณต๊ฒฉ์ž๋Š” ํŠน์ • ๋ฐ์ดํ„ฐ๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ์•„๋‹Œ์ง€๋ฅผ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋Š” ํ•™์Šต๋œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ํŠน์ • ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ์˜ํ–ฅ์„ ์ค„์ž„์œผ๋กœ์จ ์ด๋Ÿฌํ•œ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด๋ฅผ ๋ณด์žฅํ•˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๊ธฐ์ˆ ์ด๋‹ค. ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ์ˆ˜์ค€์„ ์ •๋Ÿ‰์ ์œผ๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ์›ํ•˜๋Š” ๋งŒํผ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์ถฉ์กฑ์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์ถฉ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ทธ๋งŒํผ์˜ ๋ฌด์ž‘์œ„์„ฑ์„ ๋”ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋–จ์–ด๋œจ๋ฆฐ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ณธ๋ฌธ์—์„œ๋Š” ๋ชจ์Šค ์ด๋ก ์„ ์ด์šฉํ•˜์—ฌ ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ•๋ก ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๊ทธ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ๊ฐœ๋ฐœํ•˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์€ ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์ˆ˜์ค€์—์„œ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณดํ˜ธํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์ƒํ˜ธ ๋ณด์™„์ ์ด๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ํ•˜๋‚˜์˜ ํ†ตํ•ฉ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ๊ธฐ๊ณ„ํ•™์Šต์ด ๊ฐœ์ธ์˜ ๋ฏผ๊ฐ ์ •๋ณด๋กค ๋ณดํ˜ธํ•ด์•ผ ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๋”์šฑ ๋„๋ฆฌ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ๋Œ€ ํšจ๊ณผ๋ฅผ ๊ฐ€์ง„๋‹ค.Recent development of artificial intelligence systems has been driven by various factors such as the development of new algorithms and the the explosive increase in the amount of available data. In the real-world scenarios, individuals or corporations benefit by providing data for training a machine learning model or the trained model. However, it has been revealed that sharing of data or the model can lead to invasion of personal privacy by leaking personal sensitive information. In this dissertation, we focus on developing privacy-preserving machine learning methods which can protect sensitive information. Homomorphic encryption can protect the privacy of data and the models because machine learning algorithms can be applied to encrypted data, but requires much larger computation time than conventional operations. For efficient computation, we take two approaches. The first is to reduce the amount of computation in the training phase. We present an efficient training algorithm by encrypting only few important information. In specific, we develop a ridge regression algorithm that greatly reduces the amount of computation when one or two sensitive variables are encrypted. Furthermore, we extend the method to apply it to classification problems by developing a new logistic regression algorithm that can maximally exclude searching of hyper-parameters that are not suitable for machine learning with homomorphic encryption. Another approach is to apply homomorphic encryption only when the trained model is used for inference, which prevents direct exposure of the test data and the model information. We propose a homomorphic-encryption-friendly algorithm for inference of support based clustering. Though homomorphic encryption can prevent various threats to data and the model information, it cannot defend against secondary attacks through inference APIs. It has been reported that an adversary can extract information about the training data only with his or her input and the corresponding output of the model. For instance, the adversary can determine whether specific data is included in the training data or not. Differential privacy is a mathematical concept which guarantees defense against those attacks by reducing the impact of specific data samples on the trained model. Differential privacy has the advantage of being able to quantitatively express the degree of privacy, but it reduces the utility of the model by adding randomness to the algorithm. Therefore, we propose a novel method which can improve the utility while maintaining the privacy of differentially private clustering algorithms by utilizing Morse theory. The privacy-preserving machine learning methods proposed in this paper can complement each other to prevent different levels of attacks. We expect that our methods can construct an integrated system and be applied to various domains where machine learning involves sensitive personal information.Chapter 1 Introduction 1 1.1 Motivation of the Dissertation 1 1.2 Aims of the Dissertation 7 1.3 Organization of the Dissertation 10 Chapter 2 Preliminaries 11 2.1 Homomorphic Encryption 11 2.2 Differential Privacy 14 Chapter 3 Efficient Homomorphic Encryption Framework for Ridge Regression 18 3.1 Problem Statement 18 3.2 Framework 22 3.3 Proposed Method 25 3.3.1 Regression with one Encrypted Sensitive Variable 25 3.3.2 Regression with two Encrypted Sensitive Variables 30 3.3.3 Adversarial Perturbation Against Attribute Inference Attack 35 3.3.4 Algorithm for Ridge Regression 36 3.3.5 Algorithm for Adversarial Perturbation 37 3.4 Experiments 40 3.4.1 Experimental Setting 40 3.4.2 Experimental Results 42 3.5 Chapter Summary 47 Chapter 4 Parameter-free Homomorphic-encryption-friendly Logistic Regression 53 4.1 Problem Statement 53 4.2 Proposed Method 56 4.2.1 Motivation 56 4.2.2 Framework 58 4.3 Theoretical Results 63 4.4 Experiments 68 4.4.1 Experimental Setting 68 4.4.2 Experimental Results 70 4.5 Chapter Summary 75 Chapter 5 Homomorphic-encryption-friendly Evaluation for Support Vector Clustering 76 5.1 Problem Statement 76 5.2 Background 78 5.2.1 CKKS scheme 78 5.2.2 SVC 80 5.3 Proposed Method 82 5.4 Experiments 86 5.4.1 Experimental Setting 86 5.4.2 Experimental Results 87 5.5 Chapter Summary 89 Chapter 6 Differentially Private Mixture of Gaussians Clustering with Morse Theory 95 6.1 Problem Statement 95 6.2 Background 98 6.2.1 Mixture of Gaussians 98 6.2.2 Morse Theory 99 6.2.3 Dynamical System Perspective 101 6.3 Proposed Method 104 6.3.1 Differentially private clustering 105 6.3.2 Transition equilibrium vectors and the weighted graph 108 6.3.3 Hierarchical merging of sub-clusters 111 6.4 Theoretical Results 112 6.5 Experiments 117 6.5.1 Experimental Setting 117 6.5.2 Experimental Results 119 6.6 Chapter Summary 122 Chapter 7 Conclusion 124 7.1 Conclusion 124 7.2 Future Direction 126 Bibliography 128 ๊ตญ๋ฌธ์ดˆ๋ก 154๋ฐ•

    Differentially private data publishing via cross-moment microaggregation

    Get PDF
    Differential privacy is one of the most prominent privacy notions in the field of anonymization. However, its strong privacy guarantees very often come at the expense of significantly degrading the utility of the protected data. To cope with this, numerous mechanisms have been studied that reduce the sensitivity of the data and hence the noise required to satisfy this notion. In this paper, we present a generalization of classical microaggregation, where the aggregated records are replaced by the group mean and additional statistical measures, with the purpose of evaluating it as a sensitivity reduction mechanism. We propose an anonymization methodology for numerical microdata in which the target of protection is a data set microaggregated in this generalized way, and the disclosure risk limitation is guaranteed through differential privacy via record-level perturbation. Specifically, we describe three anonymization algorithms where microaggregation can be applied to either entire records or groups of attributes independently. Our theoretical analysis computes the sensitivities of the first two central cross moments; we apply fundamental results from matrix perturbation theory to derive sensitivity bounds on the eigenvalues and eigenvectors of the covariance and coskewness matrices. Our extensive experimental evaluation shows that data utility can be enhanced significantly for medium to large sizes of the microaggregation groups. For this range of group sizes, we find experimental evidence that our approach can provide not only higher utility but also higher privacy than traditional microaggregation.The authors are thankful to A. Azzalini for his clarifications on the sampling of multivariate skew-normal distributions. Partial support to this work has been received from the European Commission (projects H2020-644024 โ€œCLARUSโ€ and H2020-700540 โ€œCANVASโ€), the Government of Catalonia (ICREA Academia Prize to J. Domingo-Ferrer), and the Spanish Government (projects TIN2014-57364-C2-1-R โ€œSmart-Glacisโ€ and TIN2016-80250-R โ€œSec-MCloudโ€). J. Parra-Arnau is the recipient of a Juan de la Cierva postdoctoral fellowship, FJCI-2014-19703, from the Spanish Ministry of Economy and Competitiveness. The authors are with the UNESCO Chair in Data Privacy, but the views in this paper are their own and are not necessarily shared by UNESCO.Postprint (author's final draft

    Efficient algorithm for the k-means problem with Must-Link and Cannot-Link constraints

    Get PDF
    Constrained clustering, such as k -means with instance-level Must-Link (ML) and Cannot-Link (CL) auxiliary information as the constraints, has been extensively studied recently, due to its broad applications in data science and AI. Despite some heuristic approaches, there has not been any algorithm providing a non-trivial approximation ratio to the constrained k -means problem. To address this issue, we propose an algorithm with a provable approximation ratio of O(logk) when only ML constraints are considered. We also empirically evaluate the performance of our algorithm on real-world datasets having artificial ML and disjoint CL constraints. The experimental results show that our algorithm outperforms the existing greedy-based heuristic methods in clustering accuracy
    • โ€ฆ
    corecore