14 research outputs found

    Privacy preserving recommender systems

    Get PDF
    The recommender systems help users find suitable and interesting products and contents from the huge amount of information that are available in the internet. There are various types of recommender systems available which have been providing recommendation services to users. For example Collaborative Filtering (CF) based recommendations, Content based (CB) recommendations, context aware recommendations and so on. Despite the fact that these recommender systems are very useful to solve the information overload problem by filtering interesting information, they suffer from huge privacy issues. In order to generate user personalized recommendations, the recommendation service providers need to acquire the information related to attributes, preferences, experiences as well as demands, which are related to users' confidential information. Usually the more information available to the service providers, the more accurate recommendations can be generated. However, the service providers are not always trustworthy to share personal information for recommendation purposes since they may cause serious privacy threats to users' privacy by leaking them to other parties or providing false recommendations. Therefore the user information must be protected prior to share them to any third party service provider to ensure the privacy of users. To overcome the privacy issues of recommender systems several techniques have been proposed which can be categorized into decentralization, randomization and secure computations based approaches. In decentralization based approach, the central service providers are removed and the main controls of recommendation services are given to participant users. The main issue with this kind of approach is that to generate recommendations, the users need to be dependant to other users' availability in online services. If any user becomes offline, her information can not be used in the system. The randomization based techniques add noises to users data to obfuscate them from learning the true information. However the main issue is that adding noise affects recommendation accuracy. On the contrary, the secure computations preserve user information while providing accurate recommendations. In this thesis we preserve user privacy by means of encrypting user information, specifically their ratings and other related information using homomorphic encryption based techniques to provide recommendations based on the encrypted data. The main advantage of homomorphic encryption based technique is that it is semantically secure and computationally it is hard to distinguish the true information from the given ciphertext. Using the homomorphic based encryption tools and techniques we build different privacy preserving protocols for different types of recommendation approaches by analyzing their privacy requirements and challenges. More specifically, we focus on different key recommendation techniques and differentiate them into centralized and partitioned dataset based recommendation techniques. From available recommendation techniques, we found that some of the existing and popular recommendation techniques like user based recommendation, item based recommendation and context aware recommendation can be grouped into centralized recommendation approach. In partitioned dataset based recommendation, the user information can be partitioned into different organizations and these organizations can collaborate with each other by gathering sufficient information in order to provide accurate recommendations without revealing their own confidential information. After categorizing the recommendation techniques we analyze the problems and requirements in terms of privacy preservation. Then for each type of recommendation approach, we develop the privacy preserving protocols to generate recommendations taking their specific privacy requirements and challenges into consideration. We also investigate the problems and limitations of existing privacy preserving recommendations and found that the current solutions suffer from huge computation and communication overhead as well as privacy of users. In the thesis we identify the related problems and solve the issues using our proposed privacy preserving protocols. As an overall idea, our proposed recommendation protocols work as follows. The users encrypt their ratings using homomorphic encryption and send them to service providers. We assume the service providers are semi honest but curious, they follow the protocol but at the same time try to find new information from the available data. The service provider has the ability to perform homomorphic operations and it performs certain computations over encrypted data without learning any true information and returns the results to the query users who ask for recommendations. The system models of our privacy preserving protocols for different recommendation techniques differ from each other because of their different privacy requirements. The proposed privacy preserving protocols are tested on various real world datasets. Based on the application areas of different recommendation approaches our gathered datasets are also different such as movie rating, social network, checkin information for different locations and quality of service of web services. For each proposed privacy preserving protocols we also present the privacy analysis and describe how the system can perform the computations without leaking the private information of users. The experimental and privacy analysis of our proposed privacy preserving protocols for different types of recommendation techniques show that they are private as well as practical

    Privacy-Preserving intrusion detection over network data

    Get PDF
    Effective protection against cyber-attacks requires constant monitoring and analysis of system data such as log files and network packets in an IT infrastructure, which may contain sensitive information. To this end, security operation centers (SOC) are established to detect, analyze, and respond to cyber-security incidents. Security officers at SOC are not necessarily trusted with handling the content of the sensitive and private information, especially in case when SOC services are outsourced as maintaining in-house expertise and capability in cyber-security is expensive. Therefore, an end-to-end security solution is needed for the system data. SOC often utilizes detection models either for known types of attacks or for an anomaly and applies them to the collected data to detect cyber-security incidents. The models are usually constructed from historical data that contains records pertaining to attacks and normal functioning of the IT infrastructure under monitoring; e.g., using machine learning techniques. SOC is also motivated to keep its models confidential for three reasons: i) to capitalize on the models that are its propriety expertise, ii) to protect its detection strategies against adversarial machine learning, in which intelligent and adaptive adversaries carefully manipulate their attack strategy to avoid detection, and iii) the model might have been trained on sensitive information, whereby revealing the model can violate certain laws and regulations. Therefore, detection models are also private. In this dissertation, we propose a scenario in which privacy of both system data and detection models is protected and information leakage is either prevented altogether or quantifiably decreased. Our main approach is to provide an end-to-end encryption for system data and detection models utilizing lattice-based cryptography that allows homomorphic operations over the encrypted data. Assuming that the detection models are previously obtained from training data by SOC, we apply the models to system data homomorphically, whereby the model is encrypted. We take advantage of three different machine learning algorithms to extract intrusion models by training historical data. Using different data sets (two recent data sets, and one outdated but widely used in the intrusion detection literature), the performance of each algorithm is evaluated via the following metrics: i) the time that takes to extract the rules, ii) the time that takes to apply the rules on data homomorphically, iii) the accuracy of the rules in detecting intrusions, and iv) the number of rules. Our experiments demonstrates that the proposed privacy-preserving intrusion detection system (IDS) is feasible in terms of execution times and reliable in terms of accurac

    Robust Representation Learning for Privacy-Preserving Machine Learning: A Multi-Objective Autoencoder Approach

    Full text link
    Several domains increasingly rely on machine learning in their applications. The resulting heavy dependence on data has led to the emergence of various laws and regulations around data ethics and privacy and growing awareness of the need for privacy-preserving machine learning (ppML). Current ppML techniques utilize methods that are either purely based on cryptography, such as homomorphic encryption, or that introduce noise into the input, such as differential privacy. The main criticism given to those techniques is the fact that they either are too slow or they trade off a model s performance for improved confidentiality. To address this performance reduction, we aim to leverage robust representation learning as a way of encoding our data while optimizing the privacy-utility trade-off. Our method centers on training autoencoders in a multi-objective manner and then concatenating the latent and learned features from the encoding part as the encoded form of our data. Such a deep learning-powered encoding can then safely be sent to a third party for intensive training and hyperparameter tuning. With our proposed framework, we can share our data and use third party tools without being under the threat of revealing its original form. We empirically validate our results on unimodal and multimodal settings, the latter following a vertical splitting system and show improved performance over state-of-the-art

    Secure and Efficient Comparisons between Untrusted Parties

    Get PDF
    A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

    ๋ฏผ๊ฐํ•œ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ์ˆ  ๊ฐœ๋ฐœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2022. 8. ์ด์žฌ์šฑ.์ตœ๊ทผ ์ธ๊ณต์ง€๋Šฅ์˜ ์„ฑ๊ณต์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์š”์ธ์ด ์žˆ์œผ๋‚˜, ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ๋ฐœ๊ณผ ์ •์ œ๋œ ๋ฐ์ดํ„ฐ ์–‘์˜ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์ธ ์ฆ๊ฐ€๋กœ ์ธํ•œ ์˜ํ–ฅ์ด ํฌ๋‹ค. ๋”ฐ๋ผ์„œ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ๋Š” ์‹ค์žฌ์  ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜๋ฉฐ, ํ˜„์‹ค ์„ธ๊ณ„์—์„œ ๊ฐœ์ธ ๋˜๋Š” ๊ธฐ์—…์€ ํ•™์Šต๋œ ๋ชจ๋ธ ๋˜๋Š” ํ•™์Šต์— ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ ์ด์ต์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ๋ฐ์ดํ„ฐ ๋˜๋Š” ๋ชจ๋ธ์˜ ๊ณต์œ ๋Š” ๊ฐœ์ธ์˜ ๋ฏผ๊ฐ ์ •๋ณด๋ฅผ ์œ ์ถœํ•จ์œผ๋กœ์จ ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ์นจํ•ด๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์ด ๋ฐํ˜€์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๋ชฉํ‘œ๋Š” ๋ฏผ๊ฐ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์„ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ตœ๊ทผ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ์ˆ , ์ฆ‰ ๋™ํ˜• ์•”ํ˜ธ์™€ ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋จผ์ €, ๋™ํ˜• ์•”ํ˜ธ๋Š” ์•”ํ˜ธํ™”๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๊ธฐ๊ณ„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ํ™œ์šฉํ•œ ์—ฐ์‚ฐ์€ ๊ธฐ์กด์˜ ์—ฐ์‚ฐ์— ๋น„ํ•ด ๋งค์šฐ ํฐ ์—ฐ์‚ฐ ์‹œ๊ฐ„์„ ์š”๊ตฌํ•˜๋ฏ€๋กœ ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค. ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋Š” ํ•™์Šต ๋‹จ๊ณ„์—์„œ์˜ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋ถ€ํ„ฐ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ์ ์šฉํ•˜๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ํ•จ๊ป˜ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ๋งŒ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ๋ฒ”์œ„๊ฐ€ ๋„“์–ด์ง€์ง€๋งŒ, ๊ทธ๋งŒํผ ์—ฐ์‚ฐ๋Ÿ‰์ด ๋Š˜์–ด๋‚œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ผ๋ถ€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ •๋ณด๋งŒ์„ ์•”ํ˜ธํ™”ํ•จ์œผ๋กœ์จ ํ•™์Šต ๋‹จ๊ณ„๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์ผ๋ถ€ ๋ฏผ๊ฐ ๋ณ€์ˆ˜๊ฐ€ ์•”ํ˜ธํ™”๋˜์–ด ์žˆ์„ ๋•Œ ์—ฐ์‚ฐ๋Ÿ‰์„ ๋งค์šฐ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฆฟ์ง€ ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋˜ํ•œ ๊ฐœ๋ฐœ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™•์žฅ์‹œ์ผœ ๋™ํ˜• ์•”ํ˜ธ ์นœํ™”์ ์ด์ง€ ์•Š์€ ํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ ๊ณผ์ •์„ ์ตœ๋Œ€ํ•œ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•จ๊ป˜ ์ œ์•ˆํ•œ๋‹ค. ํšจ์œจ์ ์ธ ์—ฐ์‚ฐ์„ ์œ„ํ•œ ๋‘ ๋ฒˆ์งธ ์ ‘๊ทผ๋ฒ•์€ ๋™ํ˜• ์•”ํ˜ธ๋ฅผ ๊ธฐ๊ณ„ํ•™์Šต์˜ ์ถ”๋ก  ๋‹จ๊ณ„์—์„œ๋งŒ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์‹œํ—˜ ๋ฐ์ดํ„ฐ์˜ ์ง์ ‘์ ์ธ ๋…ธ์ถœ์„ ๋ง‰์„ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์„œํฌํŠธ ๋ฒกํ„ฐ ๊ตฐ์ง‘ํ™” ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋™ํ˜• ์•”ํ˜ธ ์นœํ™”์  ์ถ”๋ก  ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋™ํ˜• ์•”ํ˜ธ๋Š” ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์œ„ํ˜‘์— ๋Œ€ํ•ด์„œ ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ ์ •๋ณด๋ฅผ ๋ณดํ˜ธํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถ”๋ก  ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•  ๋•Œ ์ถ”๋ก  ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ ๋ชจ๋ธ๊ณผ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๋ณดํ˜ธํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•ด ๊ณต๊ฒฉ์ž๊ฐ€ ์ž์‹ ์ด ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์™€ ๊ทธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ถ”๋ก  ๊ฒฐ๊ณผ๋งŒ์„ ์ด์šฉํ•˜์—ฌ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ๊ณผ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ์ด ๋ฐํ˜€์ง€๊ณ  ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณต๊ฒฉ์ž๋Š” ํŠน์ • ๋ฐ์ดํ„ฐ๊ฐ€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ์•„๋‹Œ์ง€๋ฅผ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋Š” ํ•™์Šต๋œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ํŠน์ • ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ์˜ ์˜ํ–ฅ์„ ์ค„์ž„์œผ๋กœ์จ ์ด๋Ÿฌํ•œ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด๋ฅผ ๋ณด์žฅํ•˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๊ธฐ์ˆ ์ด๋‹ค. ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ์˜ ์ˆ˜์ค€์„ ์ •๋Ÿ‰์ ์œผ๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ์›ํ•˜๋Š” ๋งŒํผ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์ถฉ์กฑ์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์ถฉ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ทธ๋งŒํผ์˜ ๋ฌด์ž‘์œ„์„ฑ์„ ๋”ํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋–จ์–ด๋œจ๋ฆฐ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ณธ๋ฌธ์—์„œ๋Š” ๋ชจ์Šค ์ด๋ก ์„ ์ด์šฉํ•˜์—ฌ ์ฐจ๋ถ„ ํ”„๋ผ์ด๋ฒ„์‹œ ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ•๋ก ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๊ทธ ์„ฑ๋Šฅ์„ ๋Œ์–ด์˜ฌ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ๊ฐœ๋ฐœํ•˜๋Š” ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณด์กด ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก ์€ ๊ฐ๊ธฐ ๋‹ค๋ฅธ ์ˆ˜์ค€์—์„œ ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณดํ˜ธํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์ƒํ˜ธ ๋ณด์™„์ ์ด๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ํ•˜๋‚˜์˜ ํ†ตํ•ฉ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜์—ฌ ๊ธฐ๊ณ„ํ•™์Šต์ด ๊ฐœ์ธ์˜ ๋ฏผ๊ฐ ์ •๋ณด๋กค ๋ณดํ˜ธํ•ด์•ผ ํ•˜๋Š” ์—ฌ๋Ÿฌ ๋ถ„์•ผ์—์„œ ๋”์šฑ ๋„๋ฆฌ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ๋Œ€ ํšจ๊ณผ๋ฅผ ๊ฐ€์ง„๋‹ค.Recent development of artificial intelligence systems has been driven by various factors such as the development of new algorithms and the the explosive increase in the amount of available data. In the real-world scenarios, individuals or corporations benefit by providing data for training a machine learning model or the trained model. However, it has been revealed that sharing of data or the model can lead to invasion of personal privacy by leaking personal sensitive information. In this dissertation, we focus on developing privacy-preserving machine learning methods which can protect sensitive information. Homomorphic encryption can protect the privacy of data and the models because machine learning algorithms can be applied to encrypted data, but requires much larger computation time than conventional operations. For efficient computation, we take two approaches. The first is to reduce the amount of computation in the training phase. We present an efficient training algorithm by encrypting only few important information. In specific, we develop a ridge regression algorithm that greatly reduces the amount of computation when one or two sensitive variables are encrypted. Furthermore, we extend the method to apply it to classification problems by developing a new logistic regression algorithm that can maximally exclude searching of hyper-parameters that are not suitable for machine learning with homomorphic encryption. Another approach is to apply homomorphic encryption only when the trained model is used for inference, which prevents direct exposure of the test data and the model information. We propose a homomorphic-encryption-friendly algorithm for inference of support based clustering. Though homomorphic encryption can prevent various threats to data and the model information, it cannot defend against secondary attacks through inference APIs. It has been reported that an adversary can extract information about the training data only with his or her input and the corresponding output of the model. For instance, the adversary can determine whether specific data is included in the training data or not. Differential privacy is a mathematical concept which guarantees defense against those attacks by reducing the impact of specific data samples on the trained model. Differential privacy has the advantage of being able to quantitatively express the degree of privacy, but it reduces the utility of the model by adding randomness to the algorithm. Therefore, we propose a novel method which can improve the utility while maintaining the privacy of differentially private clustering algorithms by utilizing Morse theory. The privacy-preserving machine learning methods proposed in this paper can complement each other to prevent different levels of attacks. We expect that our methods can construct an integrated system and be applied to various domains where machine learning involves sensitive personal information.Chapter 1 Introduction 1 1.1 Motivation of the Dissertation 1 1.2 Aims of the Dissertation 7 1.3 Organization of the Dissertation 10 Chapter 2 Preliminaries 11 2.1 Homomorphic Encryption 11 2.2 Differential Privacy 14 Chapter 3 Efficient Homomorphic Encryption Framework for Ridge Regression 18 3.1 Problem Statement 18 3.2 Framework 22 3.3 Proposed Method 25 3.3.1 Regression with one Encrypted Sensitive Variable 25 3.3.2 Regression with two Encrypted Sensitive Variables 30 3.3.3 Adversarial Perturbation Against Attribute Inference Attack 35 3.3.4 Algorithm for Ridge Regression 36 3.3.5 Algorithm for Adversarial Perturbation 37 3.4 Experiments 40 3.4.1 Experimental Setting 40 3.4.2 Experimental Results 42 3.5 Chapter Summary 47 Chapter 4 Parameter-free Homomorphic-encryption-friendly Logistic Regression 53 4.1 Problem Statement 53 4.2 Proposed Method 56 4.2.1 Motivation 56 4.2.2 Framework 58 4.3 Theoretical Results 63 4.4 Experiments 68 4.4.1 Experimental Setting 68 4.4.2 Experimental Results 70 4.5 Chapter Summary 75 Chapter 5 Homomorphic-encryption-friendly Evaluation for Support Vector Clustering 76 5.1 Problem Statement 76 5.2 Background 78 5.2.1 CKKS scheme 78 5.2.2 SVC 80 5.3 Proposed Method 82 5.4 Experiments 86 5.4.1 Experimental Setting 86 5.4.2 Experimental Results 87 5.5 Chapter Summary 89 Chapter 6 Differentially Private Mixture of Gaussians Clustering with Morse Theory 95 6.1 Problem Statement 95 6.2 Background 98 6.2.1 Mixture of Gaussians 98 6.2.2 Morse Theory 99 6.2.3 Dynamical System Perspective 101 6.3 Proposed Method 104 6.3.1 Differentially private clustering 105 6.3.2 Transition equilibrium vectors and the weighted graph 108 6.3.3 Hierarchical merging of sub-clusters 111 6.4 Theoretical Results 112 6.5 Experiments 117 6.5.1 Experimental Setting 117 6.5.2 Experimental Results 119 6.6 Chapter Summary 122 Chapter 7 Conclusion 124 7.1 Conclusion 124 7.2 Future Direction 126 Bibliography 128 ๊ตญ๋ฌธ์ดˆ๋ก 154๋ฐ•

    Building and evaluating privacy-preserving data processing systems

    Get PDF
    Large-scale data processing prompts a number of important challenges, including guaranteeing that collected or published data is not misused, preventing disclosure of sensitive information, and deploying privacy protection frameworks that support usable and scalable services. In this dissertation, we study and build systems geared for privacy-friendly data processing, enabling computational scenarios and applications where potentially sensitive data can be used to extract useful knowledge, and which would otherwise be impossible without such strong privacy guarantees. For instance, we show how to privately and efficiently aggregate data from many sources and large streams, and how to use the aggregates to extract useful statistics and train simple machine learning models. We also present a novel technique for privately releasing generative machine learning models and entire high-dimensional datasets produced by these models. Finally, we demonstrate that the data used by participants in training generative and collaborative learning models may be vulnerable to inference attacks and discuss possible mitigation strategies

    Revealing the landscape of privacy-enhancing technologies in the context of data markets for the IoT: A systematic literature review

    Get PDF
    IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that - while solutions have been suggested to some extent - are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part โ€œTechnologies and Methodsโ€ contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part โ€œProcesses and Applicationsโ€ details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Analyzing and Applying Cryptographic Mechanisms to Protect Privacy in Applications

    Get PDF
    Privacy-Enhancing Technologies (PETs) emerged as a technology-based response to the increased collection and storage of data as well as the associated threats to individuals' privacy in modern applications. They rely on a variety of cryptographic mechanisms that allow to perform some computation without directly obtaining knowledge of plaintext information. However, many challenges have so far prevented effective real-world usage in many existing applications. For one, some mechanisms leak some information or have been proposed outside of security models established within the cryptographic community, leaving open how effective they are at protecting privacy in various applications. Additionally, a major challenge causing PETs to remain largely academic is their practicality-in both efficiency and usability. Cryptographic mechanisms introduce a lot of overhead, which is mostly prohibitive, and due to a lack of high-level tools are very hard to integrate for outsiders. In this thesis, we move towards making PETs more effective and practical in protecting privacy in numerous applications. We take a two-sided approach of first analyzing the effective security (cryptanalysis) of candidate mechanisms and then building constructions and tools (cryptographic engineering) for practical use in specified emerging applications in the domain of machine learning crucial to modern use cases. In the process, we incorporate an interdisciplinary perspective for analyzing mechanisms and by collaboratively building privacy-preserving architectures with requirements from the application domains' experts. Cryptanalysis. While mechanisms like Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC) provably leak no additional information, Encrypted Search Algorithms (ESAs) and Randomization-only Two-Party Computation (RoTPC) possess additional properties that require cryptanalysis to determine effective privacy protection. ESAs allow for search on encrypted data, an important functionality in many applications. Most efficient ESAs possess some form of well-defined information leakage, which is cryptanalyzed via a breadth of so-called leakage attacks proposed in the literature. However, it is difficult to assess their practical effectiveness given that previous evaluations were closed-source, used restricted data, and made assumptions about (among others) the query distribution because real-world query data is very hard to find. For these reasons, we re-implement known leakage attacks in an open-source framework and perform a systematic empirical re-evaluation of them using a variety of new data sources that, for the first time, contain real-world query data. We obtain many more complete and novel results where attacks work much better or much worse than what was expected based on previous evaluations. RoTPC mechanisms require cryptanalysis as they do not rely on established techniques and security models, instead obfuscating messages using only randomizations. A prominent protocol is a privacy-preserving scalar product protocol by Lu et al. (IEEE TPDS'13). We show that this protocol is formally insecure and that this translates to practical insecurity by presenting attacks that even allow to test for certain inputs, making the case for more scrutiny of RoTPC protocols used as PETs. This part of the thesis is based on the following two publications: [KKM+22] S. KAMARA, A. KATI, T. MOATAZ, T. SCHNEIDER, A. TREIBER, M. YONLI. โ€œSoK: Cryptanalysis of Encrypted Search with LEAKER - A framework for LEakage AttacK Evaluation on Real-world dataโ€. In: 7th IEEE European Symposium on Security and Privacy (EuroS&Pโ€™22). Full version: https://ia.cr/2021/1035. Code: https://encrypto.de/code/LEAKER. IEEE, 2022, pp. 90โ€“108. Appendix A. [ST20] T. SCHNEIDER , A. TREIBER. โ€œA Comment on Privacy-Preserving Scalar Product Protocols as proposed in โ€œSPOCโ€โ€. In: IEEE Transactions on Parallel and Distributed Systems (TPDS) 31.3 (2020). Full version: https://arxiv.org/abs/1906.04862. Code: https://encrypto.de/code/SPOCattack, pp. 543โ€“546. CORE Rank A*. Appendix B. Cryptographic Engineering. Given the above results about cryptanalysis, we investigate using the leakage-free and provably-secure cryptographic mechanisms of HE and SMPC to protect privacy in machine learning applications. As much of the cryptographic community has focused on PETs for neural network applications, we focus on two other important applications and models: Speaker recognition and sum product networks. We particularly show the efficiency of our solutions in possible real-world scenarios and provide tools usable for non-domain experts. In speaker recognition, a user's voice data is matched with reference data stored at the service provider. Using HE and SMPC, we build the first privacy-preserving speaker recognition system that includes the state-of-the-art technique of cohort score normalization using cohort pruning via SMPC. Then, we build a privacy-preserving speaker recognition system relying solely on SMPC, which we show outperforms previous solutions based on HE by a factor of up to 4000x. We show that both our solutions comply with specific standards for biometric information protection and, thus, are effective and practical PETs for speaker recognition. Sum Product Networks (SPNs) are noteworthy probabilistic graphical models that-like neural networks-also need efficient methods for privacy-preserving inference as a PET. We present CryptoSPN, which uses SMPC for privacy-preserving inference of SPNs that (due to a combination of machine learning and cryptographic techniques and contrary to most works on neural networks) even hides the network structure. Our implementation is integrated into the prominent SPN framework SPFlow and evaluates medium-sized SPNs within seconds. This part of the thesis is based on the following three publications: [NPT+19] A. NAUTSCH, J. PATINO, A. TREIBER, T. STAFYLAKIS, P. MIZERA, M. TODISCO, T. SCHNEIDER, N. EVANS. Privacy-Preserving Speaker Recognition with Cohort Score Normalisationโ€. In: 20th Conference of the International Speech Communication Association (INTERSPEECHโ€™19). Online: https://arxiv.org/abs/1907.03454. International Speech Communication Association (ISCA), 2019, pp. 2868โ€“2872. CORE Rank A. Appendix C. [TNK+19] A. TREIBER, A. NAUTSCH , J. KOLBERG , T. SCHNEIDER , C. BUSCH. โ€œPrivacy-Preserving PLDA Speaker Verification using Outsourced Secure Computationโ€. In: Speech Communication 114 (2019). Online: https://encrypto.de/papers/TNKSB19.pdf. Code: https://encrypto.de/code/PrivateASV, pp. 60โ€“71. CORE Rank B. Appendix D. [TMW+20] A. TREIBER , A. MOLINA , C. WEINERT , T. SCHNEIDER , K. KERSTING. โ€œCryptoSPN: Privacy-preserving Sum-Product Network Inferenceโ€. In: 24th European Conference on Artificial Intelligence (ECAIโ€™20). Full version: https://arxiv.org/abs/2002.00801. Code: https://encrypto.de/code/CryptoSPN. IOS Press, 2020, pp. 1946โ€“1953. CORE Rank A. Appendix E. Overall, this thesis contributes to a broader security analysis of cryptographic mechanisms and new systems and tools to effectively protect privacy in various sought-after applications
    corecore