22,408 research outputs found

    ๋ถ„์‚ฐ ์ปดํ“จํŒ…๊ณผ ์บ์‹œ๋ฅผ ์ ‘๋ชฉํ•œ ์ •๋ณด ๊ฒ€์ƒ‰์—์„œ์˜ ๋ณด์•ˆ ๋ฐ ํ”„๋ผ์ด๋ฒ„์‹œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ์ด์ •์šฐ.๋งŽ์€ ์–‘์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ์ด๋‚˜ ๋ฐ์ดํ„ฐ ๊ณ„์‚ฐ์„ ์œ„ํ•ด์„œ๋Š” ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์ด ํ•„์ˆ˜์ ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ๊ณผ ๊ณ„์‚ฐ์˜ ํšจ์œจ์˜ ๋†’์ด๋Š” ๋ฐ˜๋ฉด, ๋ฐ์ดํ„ฐ์˜ ๋ณด์•ˆ๊ณผ ํ”„๋ผ์ด๋ฒ„์‹œ์— ๋Œ€ํ•œ ์œ„ํ—˜๋„ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ €์žฅ๊ณผ ๋ฐ์ดํ„ฐ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๋ถ„์‚ฐ ์‹œ์Šคํ…œ์—์„œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณด์•ˆ๊ณผ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๊ณ ๋ คํ•œ๋‹ค. ํŠนํžˆ, ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์— ๋Œ€ํ•˜์—ฌ ๋ณด์•ˆ๊ณผ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณด์žฅํ•˜๋Š” ๋ถ€ํ˜ธํ™” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„ , ์œ ์ €๊ฐ€ ์‚ฌ์ „์— ์บ์‹œ์— ์ผ์ •๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ์žˆ๋Š” cache-aided PIR์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ๊ธฐ์กด PIR ๋ฌธ์ œ์˜ ์ตœ์  ๊ธฐ๋ฒ•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์—์„œ, ์บ์‹œ์— ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ๋Š” ๋ถ€๊ฐ€์ •๋ณด๋กœ ์ด์šฉ๋˜๋ฉฐ, ์ด๋Š” ์บ์‹œ๊ฐ€ ์—†์„ ๋•Œ ๋Œ€๋น„ ๋‹ค์šด๋กœ๋“œ์–‘์˜ ๊ฐ์†Œ๋กœ ์ด์–ด์ง„๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ถ€ํ˜ธํ™”๋œ ๋ถ„์‚ฐ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์—์„œ ๋งˆ์Šคํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๊ณ ๋ คํ•œ๋‹ค. ์ด ์‹œ์Šคํ…œ์—์„œ ์›Œ์ปค๋“ค๊ณผ ๋งˆ์Šคํ„ฐ๋Š” ๊ฐ๊ฐ ๊ณ ์œ ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๋ฉฐ, ์›Œ์ปค๋“ค์˜ ๋ฐ์ดํ„ฐ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํ˜•ํƒœ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ๋งˆ์Šคํ„ฐ๋Š” ์ž์‹ ์˜ ๋ฐ์ดํ„ฐ์™€ ๋ฐ์ดํ„ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋‚ด ํŠน์ • ๋ฐ์ดํ„ฐ์˜ ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ด์•ผ ํ•œ๋‹ค. ์ด ๋•Œ ๋งˆ์Šคํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋Š” ์›Œ์ปค๋“ค์ด ๋งˆ์Šคํ„ฐ๊ฐ€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์•ˆ์˜ ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ์›ํ•˜๋Š”์ง€ ๋ชจ๋ฅด๋Š” ๊ฒƒ์„ ๋œปํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์„ private coded computation์ด๋ผ ํ•˜๋ฉฐ, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์„ private polynomial codes๋ผ ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์—์„œ๋Š” ๊ธฐ์กด์˜ polynomial codes์—์„œ๋Š” ๊ณ ๋ ค๋˜์ง€ ์•Š์•˜๋˜ ๋น„๋™๊ธฐ์  ๊ธฐ๋ฒ•์ด ๋„์ž…๋œ๋‹ค. ์ด๋กœ ์ธํ•˜์—ฌ ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ๋ณ€ํ˜•๋œ ์ตœ์ ์˜ RPIR ๊ธฐ๋ฒ•๋Œ€๋น„ ๋” ๋น ๋ฅธ ๊ณ„์‚ฐ์‹œ๊ฐ„์„ ๋‹ฌ์„ฑํ•œ๋‹ค. ๋์œผ๋กœ, ๋ถ€ํ˜ธํ™”๋œ ๋ถ„์‚ฐ ์ปดํ“จํŒ… ์‹œ์Šคํ…œ์—์„œ ๋งˆ์Šคํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ์™€ ๋ฐ์ดํ„ฐ ๋ณด์•ˆ์„ ๋™์‹œ์— ๊ณ ๋ คํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ๋ณด์•ˆ์€ ๋งˆ์Šคํ„ฐ์˜ ๊ณ ์œ ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์›Œ์ปค๋“ค๋กœ๋ถ€ํ„ฐ ๋ณดํ˜ธํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์‹œ์Šคํ…œ์„ private secure coded computation์ด๋ผ ํ•˜๋ฉฐ, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์„ private secure polynomial codes๋ผ ํ•œ๋‹ค. Private polynomial codes๋ฅผ ๋ณ€ํ˜•ํ•˜์—ฌ private secure polynomial codes์™€ private polynomial codes๋ฅผ ๊ณ„์‚ฐ์‹œ๊ฐ„๊ณผ ํ†ต์‹ ๋Ÿ‰ ์ธก๋ฉด์—์„œ ๋น„๊ตํ•œ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, ๊ฐ™์€ ์–‘์˜ ํ†ต์‹ ๋Ÿ‰์— ๋Œ€ํ•˜์—ฌ, private secure polynomial codes๊ฐ€ ๋” ๋น ๋ฅธ ๊ณ„์‚ฐ ์‹œ๊ฐ„์„ ๋‹ฌ์„ฑํ•œ๋‹ค.As a major format of data changes from the text to the videos, the amount of memory for storing data increases exponentially, as well as the amount of computation for handling the data. As a result, to alleviate these burdens of storage and computations, the distributed systems are actively studied. Meanwhile, since low latency is one of the main objectives in 5G communications, recent techniques such as edge computing or federated learning in machine learning become important. Since the decentralized systems are fundamental characteristics of these techniques, the distributed systems which include the decentralized systems also become important. In this dissertation, I consider the distributed systems for storage and computation. For the data storage, large-scale data centers collectively store a library of files where the size of each file is tremendous. When a user needs a specific file, it can be downloaded from distributed data centers. In this system, minimizing the amount of download is a significant concern. The user's privacy in this system implies that the user should conceal the index of its desired file against the databases. This kind of problem is referred to as private information retrieval (PIR) problem. The goal of PIR problem is to minimize the amount of download from the databases while ensuring the user's privacy. Meanwhile, for a large amount of computation, the user can divide the whole computation into sub-computations and distribute them to external workers who constitute a distributed system. There can be three cases for the computation. Firstly, the user may own all of the data to be computed and sends both of its data and instructions for the computation to the workers. Secondly, the workers collectively own all of the data and the user just sends instructions for the data selection and computation to the workers. Thirdly, the user and the workers have their own data respectively and the user sends the data and instructions for the data selection and computation to the workers. Since some of the workers can be slow for various reasons, the user may use a coding technique, e.g., an erasure code, to avoid the delaying effect caused by the slow workers. This kind of technique is referred to as coded computation. In these systems, speeding up the computation process is a significant concern. In this dissertation, I focus on the third system. In the considered system, the privacy is similar to that of distributed systems for storage. On the other hand, the security implies that the user should conceal the content of its own data against the workers so that the workers do not have any information about the user's own data. In this dissertation, I consider the user's privacy in distributed systems for storage, and both of the privacy and security in distributed systems for the computation. In case of the distributed systems for storage, since the user does not have its own data, the data security on the user's data cannot be considered. Particularly, I propose some achievable schemes that ensure the privacy and security in these systems. To begin with, as a new variation of PIR problem, I consider a user's cache that has some pre-stored data of databases' library. I refer to this problem as cache-aided PIR problem. By introducing the user's cache in the PIR problem, the amount of download from the databases is significantly reduced. The achievable scheme is based on the optimal scheme for conventional PIR problem. In the achievable scheme, the pre-store cache was exploited as an side information, which reduces the amount of download, compared to the PIR problem without cache. Secondly, I consider the master's privacy in coded computation. In the system model, the workers have their own data, as well as the master. The workers' data constitutes a library of several files. The master should compute a function of its own data and a specific file in the library. The master's privacy implies that the workers' should not know which file in the library is desired by the user. I refer to this problem as private coded computation and propose an achievable scheme of private coded computation, namely private polynomial codes. The private polynomial codes are based on the polynomial codes which were proposed in the conventional coded computation system. In the achievable scheme, the workers are grouped for the privacy and asynchronous scheme is considered, which was not considered in the conventional polynomial codes. Due to the asynchronous scheme, the proposed scheme achieves the faster computation time, compared to the modified optimal RPIR scheme. Lastly, I consider the data security in coded computation, as well as the master's privacy. The system model is similar to that of private coded computation. The data security implies that the master should protect its own data against the workers. I refer to this problem as private secure coded computation and propose an achievable scheme, namely private secure polynomial codes. The private secure polynomial codes are based on the polynomial codes which were proposed in the conventional coded computation system. By modifying the private polynomial codes, the private secure polynomial codes and private secure polynomial codes are compared in terms of computation time and communication load. As a result, the private secure polynomial codes achieves faster computation time for the same communication load.1. Introduction 1 1.1 Related work 3 1.1.1 Private information retrieval 3 1.1.2 Coded computation 4 1.2 Contributions and Organization 5 2. Cache-aided Private Information Retrieval 8 2.1 Introduction 8 2.2 System model 9 2.3 Main results : 12 2.4 Achievable scheme 17 2.4.1 Cacheless phase 17 2.4.2 Cache-assisted phase 21 2.4.3 Cache-aided PIR 24 2.5 Tightness of achievable scheme 29 2.6 Conclusions and follow-up works 30 3. Private Coded Computation 32 3.1 Introduction 32 3.2 System model 37 3.3 Main results 41 3.4 Private polynomial codes 42 3.4.1 First example 42 3.4.2 Second example 48 3.4.3 General description 52 3.4.4 Privacy proof 56 3.4.5 Performance analysis 59 3.4.6 Special cases 61 3.5 Simulation results 62 3.5.1 Computation time 62 3.5.2 Communication load 68 3.6 Conclusion 69 4. Private Secure Coded Computation 71 4.1 Introduction 71 4.2 Main results 75 4.3 Private secure polynomial codes 76 4.3.1 Illustrative example 76 4.3.2 General description 80 4.3.3 Performance analysis 83 4.3.4 Privacy and security proof 84 4.4 Simulation results 85 4.4.1 Computation time 86 4.4.2 Communication load 90 4.5 Conclusion 91 5 Conclusion 93 5.1 Summary 93 5.2 Future directions 94 ๊ตญ๋ฌธ์ดˆ๋ก 105 Acknowledgement 107Docto

    Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43ร—13.43\times, and also achieves a 2.36ร—2.36\times-12.65ร—12.65\times speedup over the state-of-the-art straggler mitigation strategies

    Lagrange Coded Computing: Optimal Design for Resiliency, Security, and Privacy

    Get PDF
    We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43ร—, and also achieves a 2.36ร—-12.65ร— speedup over the state-of-the-art straggler mitigation strategies

    On the Asymptotic Capacity of XX-Secure TT-Private Information Retrieval with Graph Based Replicated Storage

    Full text link
    The problem of private information retrieval with graph-based replicated storage was recently introduced by Raviv, Tamo and Yaakobi. Its capacity remains open in almost all cases. In this work the asymptotic (large number of messages) capacity of this problem is studied along with its generalizations to include arbitrary TT-privacy and XX-security constraints, where the privacy of the user must be protected against any set of up to TT colluding servers and the security of the stored data must be protected against any set of up to XX colluding servers. A general achievable scheme for arbitrary storage patterns is presented that achieves the rate (ฯminโกโˆ’Xโˆ’T)/N(\rho_{\min}-X-T)/N, where NN is the total number of servers, and each message is replicated at least ฯminโก\rho_{\min} times. Notably, the scheme makes use of a special structure inspired by dual Generalized Reed Solomon (GRS) codes. A general converse is also presented. The two bounds are shown to match for many settings, including symmetric storage patterns. Finally, the asymptotic capacity is fully characterized for the case without security constraints (X=0)(X=0) for arbitrary storage patterns provided that each message is replicated no more than T+2T+2 times. As an example of this result, consider PIR with arbitrary graph based storage (T=1,X=0T=1, X=0) where every message is replicated at exactly 33 servers. For this 33-replicated storage setting, the asymptotic capacity is equal to 2/ฮฝ2(G)2/\nu_2(G) where ฮฝ2(G)\nu_2(G) is the maximum size of a 22-matching in a storage graph G[V,E]G[V,E]. In this undirected graph, the vertices VV correspond to the set of servers, and there is an edge uvโˆˆEuv\in E between vertices u,vu,v only if a subset of messages is replicated at both servers uu and vv
    • โ€ฆ
    corecore