250 research outputs found
Privacy-Preserving Gaussian Process Regression -- A Modular Approach to the Application of Homomorphic Encryption
Much of machine learning relies on the use of large amounts of data to train
models to make predictions. When this data comes from multiple sources, for
example when evaluation of data against a machine learning model is offered as
a service, there can be privacy issues and legal concerns over the sharing of
data. Fully homomorphic encryption (FHE) allows data to be computed on whilst
encrypted, which can provide a solution to the problem of data privacy.
However, FHE is both slow and restrictive, so existing algorithms must be
manipulated to make them work efficiently under the FHE paradigm. Some commonly
used machine learning algorithms, such as Gaussian process regression, are
poorly suited to FHE and cannot be manipulated to work both efficiently and
accurately. In this paper, we show that a modular approach, which applies FHE
to only the sensitive steps of a workflow that need protection, allows one
party to make predictions on their data using a Gaussian process regression
model built from another party's data, without either party gaining access to
the other's data, in a way which is both accurate and efficient. This
construction is, to our knowledge, the first example of an effectively
encrypted Gaussian process
A Cryptographic Ensemble for Secure Third Party Data Analysis: Collaborative Data Clustering Without Data Owner Participation
This paper introduces the twin concepts Cryptographic Ensembles and Global Encrypted Distance Matrices (GEDMs), designed to provide a solution to outsourced secure collaborative data clustering. The cryptographic ensemble comprises: Homomorphic Encryption (HE) to preserve raw data privacy, while supporting data analytics; and Multi-User Order Preserving Encryption (MUOPE) to preserve the privacy of the GEDM. Clustering can therefore be conducted over encrypted datasets without requiring decryption or the involvement of data owners once encryption has taken place, all with no loss of accuracy. The GEDM concept is applicable to large scale collaborative data mining applications that feature horizontal data partitioning. In the paper DBSCAN clustering is adopted for illustrative and evaluation purposes. The results demonstrate that the proposed solution is both efficient and accurate while maintaining data privacy
λ―Όκ°ν μ 보λ₯Ό 보νΈν μ μλ νλΌμ΄λ²μ 보쑴 κΈ°κ³νμ΅ κΈ°μ κ°λ°
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : 곡과λν μ°μ
곡νκ³Ό, 2022. 8. μ΄μ¬μ±.μ΅κ·Ό μΈκ³΅μ§λ₯μ μ±κ³΅μλ μ¬λ¬ κ°μ§ μμΈμ΄ μμΌλ, μλ‘μ΄ μκ³ λ¦¬μ¦μ κ°λ°κ³Ό μ μ λ λ°μ΄ν° μμ κΈ°νκΈμμ μΈ μ¦κ°λ‘ μΈν μν₯μ΄ ν¬λ€. λ°λΌμ κΈ°κ³νμ΅ λͺ¨λΈκ³Ό λ°μ΄ν°λ μ€μ¬μ κ°μΉλ₯Ό κ°μ§κ² λλ©°, νμ€ μΈκ³μμ κ°μΈ λλ κΈ°μ
μ νμ΅λ λͺ¨λΈ λλ νμ΅μ μ¬μ©ν λ°μ΄ν°λ₯Ό μ 곡ν¨μΌλ‘μ¨ μ΄μ΅μ μ»μ μ μλ€. κ·Έλ¬λ, λ°μ΄ν° λλ λͺ¨λΈμ 곡μ λ κ°μΈμ λ―Όκ° μ 보λ₯Ό μ μΆν¨μΌλ‘μ¨ νλΌμ΄λ²μμ μΉ¨ν΄λ‘ μ΄μ΄μ§ μ μλ€λ μ¬μ€μ΄ λ°νμ§κ³ μλ€.
λ³Έ λ
Όλ¬Έμ λͺ©νλ λ―Όκ° μ 보λ₯Ό 보νΈν μ μλ νλΌμ΄λ²μ 보쑴 κΈ°κ³νμ΅ λ°©λ²λ‘ μ κ°λ°νλ κ²μ΄λ€. μ΄λ₯Ό μν΄ μ΅κ·Ό νλ°ν μ°κ΅¬λκ³ μλ λ κ°μ§ νλΌμ΄λ²μ 보쑴 κΈ°μ , μ¦ λν μνΈμ μ°¨λΆ νλΌμ΄λ²μλ₯Ό μ¬μ©νλ€. λ¨Όμ , λν μνΈλ μνΈνλ λ°μ΄ν°μ λν΄ κΈ°κ³νμ΅ μκ³ λ¦¬μ¦μ μ μ© κ°λ₯νκ² ν¨μΌλ‘μ¨ λ°μ΄ν°μ νλΌμ΄λ²μλ₯Ό 보νΈν μ μλ€. κ·Έλ¬λ λν μνΈλ₯Ό νμ©ν μ°μ°μ κΈ°μ‘΄μ μ°μ°μ λΉν΄ λ§€μ° ν° μ°μ° μκ°μ μꡬνλ―λ‘ ν¨μ¨μ μΈ μκ³ λ¦¬μ¦μ ꡬμ±νλ κ²μ΄ μ€μνλ€. ν¨μ¨μ μΈ μ°μ°μ μν΄ μ°λ¦¬λ λ κ°μ§ μ κ·Όλ²μ μ¬μ©νλ€. 첫 λ²μ§Έλ νμ΅ λ¨κ³μμμ μ°μ°λμ μ€μ΄λ κ²μ΄λ€. νμ΅ λ¨κ³μμλΆν° λν μνΈλ₯Ό μ μ©νλ©΄ νμ΅ λ°μ΄ν°μ νλΌμ΄λ²μλ₯Ό ν¨κ» 보νΈν μ μμΌλ―λ‘ μΆλ‘ λ¨κ³μμλ§ λν μνΈλ₯Ό μ μ©νλ κ²μ λΉν΄ νλΌμ΄λ²μμ λ²μκ° λμ΄μ§μ§λ§, κ·Έλ§νΌ μ°μ°λμ΄ λμ΄λλ€. λ³Έ λ
Όλ¬Έμμλ μΌλΆ κ°μ₯ μ€μν μ 보λ§μ μνΈνν¨μΌλ‘μ¨ νμ΅ λ¨κ³λ₯Ό ν¨μ¨μ μΌλ‘ νλ λ°©λ²λ‘ μ μ μνλ€. ꡬ체μ μΌλ‘, μΌλΆ λ―Όκ° λ³μκ° μνΈνλμ΄ μμ λ μ°μ°λμ λ§€μ° μ€μΌ μ μλ λ¦Ώμ§ νκ· μκ³ λ¦¬μ¦μ κ°λ°νλ€. λν κ°λ°λ μκ³ λ¦¬μ¦μ νμ₯μμΌ λν μνΈ μΉνμ μ΄μ§ μμ νλΌλ―Έν° νμ κ³Όμ μ μ΅λν μ κ±°ν μ μλ μλ‘μ΄ λ‘μ§μ€ν± νκ· μκ³ λ¦¬μ¦μ ν¨κ» μ μνλ€.
ν¨μ¨μ μΈ μ°μ°μ μν λ λ²μ§Έ μ κ·Όλ²μ λν μνΈλ₯Ό κΈ°κ³νμ΅μ μΆλ‘ λ¨κ³μμλ§ μ¬μ©νλ κ²μ΄λ€. μ΄λ₯Ό ν΅ν΄ μν λ°μ΄ν°μ μ§μ μ μΈ λ
ΈμΆμ λ§μ μ μλ€. λ³Έ λ
Όλ¬Έμμλ μν¬νΈ λ²‘ν° κ΅°μ§ν λͺ¨λΈμ λν λν μνΈ μΉνμ μΆλ‘ λ°©λ²μ μ μνλ€.
λν μνΈλ μ¬λ¬ κ°μ§ μνμ λν΄μ λ°μ΄ν°μ λͺ¨λΈ μ 보λ₯Ό 보νΈν μ μμΌλ, νμ΅λ λͺ¨λΈμ ν΅ν΄ μλ‘μ΄ λ°μ΄ν°μ λν μΆλ‘ μλΉμ€λ₯Ό μ 곡ν λ μΆλ‘ κ²°κ³Όλ‘λΆν° λͺ¨λΈκ³Ό νμ΅ λ°μ΄ν°λ₯Ό 보νΈνμ§ λͺ»νλ€. μ°κ΅¬λ₯Ό ν΅ν΄ 곡격μκ° μμ μ΄ κ°μ§ λ°μ΄ν°μ κ·Έ λ°μ΄ν°μ λν μΆλ‘ κ²°κ³Όλ§μ μ΄μ©νμ¬ μ΄μ©νμ¬ λͺ¨λΈκ³Ό νμ΅ λ°μ΄ν°μ λν μ 보λ₯Ό μΆμΆν μ μμμ΄ λ°νμ§κ³ μλ€. μλ₯Ό λ€μ΄, 곡격μλ νΉμ λ°μ΄ν°κ° νμ΅ λ°μ΄ν°μ ν¬ν¨λμ΄ μλμ§ μλμ§λ₯Ό μΆλ‘ ν μ μλ€. μ°¨λΆ νλΌμ΄λ²μλ νμ΅λ λͺ¨λΈμ λν νΉμ λ°μ΄ν° μνμ μν₯μ μ€μμΌλ‘μ¨ μ΄λ¬ν 곡격μ λν λ°©μ΄λ₯Ό 보μ₯νλ νλΌμ΄λ²μ κΈ°μ μ΄λ€. μ°¨λΆ νλΌμ΄λ²μλ νλΌμ΄λ²μμ μμ€μ μ λμ μΌλ‘ ννν¨μΌλ‘μ¨ μνλ λ§νΌμ νλΌμ΄λ²μλ₯Ό μΆ©μ‘±μν¬ μ μμ§λ§, νλΌμ΄λ²μλ₯Ό μΆ©μ‘±μν€κΈ° μν΄μλ μκ³ λ¦¬μ¦μ κ·Έλ§νΌμ 무μμμ±μ λν΄μΌ νλ―λ‘ λͺ¨λΈμ μ±λ₯μ λ¨μ΄λ¨λ¦°λ€. λ°λΌμ, λ³Έλ¬Έμμλ λͺ¨μ€ μ΄λ‘ μ μ΄μ©νμ¬ μ°¨λΆ νλΌμ΄λ²μ κ΅°μ§ν λ°©λ²λ‘ μ νλΌμ΄λ²μλ₯Ό μ μ§νλ©΄μλ κ·Έ μ±λ₯μ λμ΄μ¬λ¦¬λ μλ‘μ΄ λ°©λ²λ‘ μ μ μνλ€.
λ³Έ λ
Όλ¬Έμμ κ°λ°νλ νλΌμ΄λ²μ 보쑴 κΈ°κ³νμ΅ λ°©λ²λ‘ μ κ°κΈ° λ€λ₯Έ μμ€μμ νλΌμ΄λ²μλ₯Ό 보νΈνλ©°, λ°λΌμ μνΈ λ³΄μμ μ΄λ€. μ μλ λ°©λ²λ‘ λ€μ νλμ ν΅ν© μμ€ν
μ ꡬμΆνμ¬ κΈ°κ³νμ΅μ΄ κ°μΈμ λ―Όκ° μ 보둀 보νΈν΄μΌ νλ μ¬λ¬ λΆμΌμμ λμ± λ리 μ¬μ©λ μ μλλ‘ νλ κΈ°λ ν¨κ³Όλ₯Ό κ°μ§λ€.Recent development of artificial intelligence systems has been driven by various factors such as the development of new algorithms and the the explosive increase in the amount of available data. In the real-world scenarios, individuals or corporations benefit by providing data for training a machine learning model or the trained model. However, it has been revealed that sharing of data or the model can lead to invasion of personal privacy by leaking personal sensitive information.
In this dissertation, we focus on developing privacy-preserving machine learning methods which can protect sensitive information. Homomorphic encryption can protect the privacy of data and the models because machine learning algorithms can be applied to encrypted data, but requires much larger computation time than conventional operations. For efficient computation, we take two approaches. The first is to reduce the amount of computation in the training phase. We present an efficient training algorithm by encrypting only few important information. In specific, we develop a ridge regression algorithm that greatly reduces the amount of computation when one or two sensitive variables are encrypted. Furthermore, we extend the method to apply it to classification problems by developing a new logistic regression algorithm that can maximally exclude searching of hyper-parameters that are not suitable for machine learning with homomorphic encryption.
Another approach is to apply homomorphic encryption only when the trained model is used for inference, which prevents direct exposure of the test data and the model information. We propose a homomorphic-encryption-friendly algorithm for inference of support based clustering.
Though homomorphic encryption can prevent various threats to data and the model information, it cannot defend against secondary attacks through inference APIs. It has been reported that an adversary can extract information about the training data only with his or her input and the corresponding output of the model. For instance, the adversary can determine whether specific data is included in the training data or not. Differential privacy is a mathematical concept which guarantees defense against those attacks by reducing the impact of specific data samples on the trained model. Differential privacy has the advantage of being able to quantitatively express the degree of privacy, but it reduces the utility of the model by adding randomness to the algorithm. Therefore, we propose a novel method which can improve the utility while maintaining the privacy of differentially private clustering algorithms by utilizing Morse theory.
The privacy-preserving machine learning methods proposed in this paper can complement each other to prevent different levels of attacks. We expect that our methods can construct an integrated system and be applied to various domains where machine learning involves sensitive personal information.Chapter 1 Introduction 1
1.1 Motivation of the Dissertation 1
1.2 Aims of the Dissertation 7
1.3 Organization of the Dissertation 10
Chapter 2 Preliminaries 11
2.1 Homomorphic Encryption 11
2.2 Differential Privacy 14
Chapter 3 Efficient Homomorphic Encryption Framework for Ridge Regression 18
3.1 Problem Statement 18
3.2 Framework 22
3.3 Proposed Method 25
3.3.1 Regression with one Encrypted Sensitive Variable 25
3.3.2 Regression with two Encrypted Sensitive Variables 30
3.3.3 Adversarial Perturbation Against Attribute Inference Attack 35
3.3.4 Algorithm for Ridge Regression 36
3.3.5 Algorithm for Adversarial Perturbation 37
3.4 Experiments 40
3.4.1 Experimental Setting 40
3.4.2 Experimental Results 42
3.5 Chapter Summary 47
Chapter 4 Parameter-free Homomorphic-encryption-friendly Logistic Regression 53
4.1 Problem Statement 53
4.2 Proposed Method 56
4.2.1 Motivation 56
4.2.2 Framework 58
4.3 Theoretical Results 63
4.4 Experiments 68
4.4.1 Experimental Setting 68
4.4.2 Experimental Results 70
4.5 Chapter Summary 75
Chapter 5 Homomorphic-encryption-friendly Evaluation for Support Vector Clustering 76
5.1 Problem Statement 76
5.2 Background 78
5.2.1 CKKS scheme 78
5.2.2 SVC 80
5.3 Proposed Method 82
5.4 Experiments 86
5.4.1 Experimental Setting 86
5.4.2 Experimental Results 87
5.5 Chapter Summary 89
Chapter 6 Differentially Private Mixture of Gaussians Clustering with Morse Theory 95
6.1 Problem Statement 95
6.2 Background 98
6.2.1 Mixture of Gaussians 98
6.2.2 Morse Theory 99
6.2.3 Dynamical System Perspective 101
6.3 Proposed Method 104
6.3.1 Differentially private clustering 105
6.3.2 Transition equilibrium vectors and the weighted graph 108
6.3.3 Hierarchical merging of sub-clusters 111
6.4 Theoretical Results 112
6.5 Experiments 117
6.5.1 Experimental Setting 117
6.5.2 Experimental Results 119
6.6 Chapter Summary 122
Chapter 7 Conclusion 124
7.1 Conclusion 124
7.2 Future Direction 126
Bibliography 128
κ΅λ¬Έμ΄λ‘ 154λ°
A Hybrid Multi-user Cloud Access Control based Block Chain Framework for Privacy Preserving Distributed Databases
Most of the traditional medical applications are insecure and difficult to compute the data integrity with variable hash size. Traditional medical data security systems are insecure and it depend on static parameters for data security. Also, distributed based cloud storage systems are independent of integrity computational and data security due to unstructured data and computational memory. As the size of the data and its dimensions are increasing in the public and private cloud servers, it is difficult to provide the machine learning based privacy preserving in cloud computing environment. Block-chain technology plays a vital role for large cloud databases. Most of the conventional block-chain frameworks are based on the existing integrity and confidentiality models. Also, these models are based on the data size and file format. In this model, a novel integrity verification and encryption framework is designed and implemented in cloud environment. In order to overcome these problems in the cloud computing environment, a hybrid integrity and security-based block-chain framework is designed and implemented on the large distributed databases. In this framework,a novel decision tree classifier is used along with non-linear mathematical hash algorithm and advanced attribute-based encryption models are used to improve the privacy of multiple users on the large cloud datasets. Experimental results proved that the proposed advanced privacy preserving based block-chain technology has better efficiency than the traditional block-chain based privacy preserving systems on large distributed databases
- β¦