Search CORE

250 research outputs found

Data Clustering using Homomorphic Encryption and Secure Chain Distance Matrices.

Author: Almutairi Nawal
Coenen Frans
Dures Keith
Publication venue: 'Scitepress'
Publication date: 01/01/2018
Field of study

University of Liverpool Repository

Crossref

Privacy Preserving Third Party Data Mining Using Cryptography

Author: Almutairi Nawal
Publication venue
Publication date: 29/01/2020
Field of study

University of Liverpool Repository

Privacy-Preserving Gaussian Process Regression -- A Modular Approach to the Application of Homomorphic Encryption

Author: Fenner Peter
Pyzer-Knapp Edward O.
Publication venue
Publication date: 28/01/2020
Field of study

Much of machine learning relies on the use of large amounts of data to train models to make predictions. When this data comes from multiple sources, for example when evaluation of data against a machine learning model is offered as a service, there can be privacy issues and legal concerns over the sharing of data. Fully homomorphic encryption (FHE) allows data to be computed on whilst encrypted, which can provide a solution to the problem of data privacy. However, FHE is both slow and restrictive, so existing algorithms must be manipulated to make them work efficiently under the FHE paradigm. Some commonly used machine learning algorithms, such as Gaussian process regression, are poorly suited to FHE and cannot be manipulated to work both efficiently and accurately. In this paper, we show that a modular approach, which applies FHE to only the sensitive steps of a workflow that need protection, allows one party to make predictions on their data using a Gaussian process regression model built from another party's data, without either party gaining access to the other's data, in a way which is both accurate and efficient. This construction is, to our knowledge, the first example of an effectively encrypted Gaussian process

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Cryptographic Ensemble for Secure Third Party Data Analysis: Collaborative Data Clustering Without Data Owner Participation

Author: Almutairi ST
Coenen FP
Dures K
Publication venue: 'Elsevier BV'
Publication date: 30/08/2019
Field of study

This paper introduces the twin concepts Cryptographic Ensembles and Global Encrypted Distance Matrices (GEDMs), designed to provide a solution to outsourced secure collaborative data clustering. The cryptographic ensemble comprises: Homomorphic Encryption (HE) to preserve raw data privacy, while supporting data analytics; and Multi-User Order Preserving Encryption (MUOPE) to preserve the privacy of the GEDM. Clustering can therefore be conducted over encrypted datasets without requiring decryption or the involvement of data owners once encryption has taken place, all with no loss of accuracy. The GEDM concept is applicable to large scale collaborative data mining applications that feature horizontal data partitioning. In the paper DBSCAN clustering is adopted for illustrative and evaluation purposes. The results demonstrate that the proposed solution is both efficient and accurate while maintaining data privacy

University of Liverpool Repository

Secure Third Party Data Clustering Using Φ Data: Multi-User Order Preserving Encryption and Super Secure Chain Distance Matrices

Author: Almutairi Nawal
Coenen FP
Dures Keith
Publication venue: Springer Nature
Publication date: 16/11/2018
Field of study

University of Liverpool Repository

민감한 정보를 보호할 수 있는 프라이버시 보존 기계학습 기술 개발

Author: 변준영
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 산업공학과, 2022. 8. 이재욱.최근 인공지능의 성공에는 여러 가지 요인이 있으나, 새로운 알고리즘의 개발과 정제된 데이터 양의 기하급수적인 증가로 인한 영향이 크다. 따라서 기계학습 모델과 데이터는 실재적 가치를 가지게 되며, 현실 세계에서 개인 또는 기업은 학습된 모델 또는 학습에 사용할 데이터를 제공함으로써 이익을 얻을 수 있다. 그러나, 데이터 또는 모델의 공유는 개인의 민감 정보를 유출함으로써 프라이버시의 침해로 이어질 수 있다는 사실이 밝혀지고 있다. 본 논문의 목표는 민감 정보를 보호할 수 있는 프라이버시 보존 기계학습 방법론을 개발하는 것이다. 이를 위해 최근 활발히 연구되고 있는 두 가지 프라이버시 보존 기술, 즉 동형 암호와 차분 프라이버시를 사용한다. 먼저, 동형 암호는 암호화된 데이터에 대해 기계학습 알고리즘을 적용 가능하게 함으로써 데이터의 프라이버시를 보호할 수 있다. 그러나 동형 암호를 활용한 연산은 기존의 연산에 비해 매우 큰 연산 시간을 요구하므로 효율적인 알고리즘을 구성하는 것이 중요하다. 효율적인 연산을 위해 우리는 두 가지 접근법을 사용한다. 첫 번째는 학습 단계에서의 연산량을 줄이는 것이다. 학습 단계에서부터 동형 암호를 적용하면 학습 데이터의 프라이버시를 함께 보호할 수 있으므로 추론 단계에서만 동형 암호를 적용하는 것에 비해 프라이버시의 범위가 넓어지지만, 그만큼 연산량이 늘어난다. 본 논문에서는 일부 가장 중요한 정보만을 암호화함으로써 학습 단계를 효율적으로 하는 방법론을 제안한다. 구체적으로, 일부 민감 변수가 암호화되어 있을 때 연산량을 매우 줄일 수 있는 릿지 회귀 알고리즘을 개발한다. 또한 개발된 알고리즘을 확장시켜 동형 암호 친화적이지 않은 파라미터 탐색 과정을 최대한 제거할 수 있는 새로운 로지스틱 회귀 알고리즘을 함께 제안한다. 효율적인 연산을 위한 두 번째 접근법은 동형 암호를 기계학습의 추론 단계에서만 사용하는 것이다. 이를 통해 시험 데이터의 직접적인 노출을 막을 수 있다. 본 논문에서는 서포트 벡터 군집화 모델에 대한 동형 암호 친화적 추론 방법을 제안한다. 동형 암호는 여러 가지 위협에 대해서 데이터와 모델 정보를 보호할 수 있으나, 학습된 모델을 통해 새로운 데이터에 대한 추론 서비스를 제공할 때 추론 결과로부터 모델과 학습 데이터를 보호하지 못한다. 연구를 통해 공격자가 자신이 가진 데이터와 그 데이터에 대한 추론 결과만을 이용하여 이용하여 모델과 학습 데이터에 대한 정보를 추출할 수 있음이 밝혀지고 있다. 예를 들어, 공격자는 특정 데이터가 학습 데이터에 포함되어 있는지 아닌지를 추론할 수 있다. 차분 프라이버시는 학습된 모델에 대한 특정 데이터 샘플의 영향을 줄임으로써 이러한 공격에 대한 방어를 보장하는 프라이버시 기술이다. 차분 프라이버시는 프라이버시의 수준을 정량적으로 표현함으로써 원하는 만큼의 프라이버시를 충족시킬 수 있지만, 프라이버시를 충족시키기 위해서는 알고리즘에 그만큼의 무작위성을 더해야 하므로 모델의 성능을 떨어뜨린다. 따라서, 본문에서는 모스 이론을 이용하여 차분 프라이버시 군집화 방법론의 프라이버시를 유지하면서도 그 성능을 끌어올리는 새로운 방법론을 제안한다. 본 논문에서 개발하는 프라이버시 보존 기계학습 방법론은 각기 다른 수준에서 프라이버시를 보호하며, 따라서 상호 보완적이다. 제안된 방법론들은 하나의 통합 시스템을 구축하여 기계학습이 개인의 민감 정보롤 보호해야 하는 여러 분야에서 더욱 널리 사용될 수 있도록 하는 기대 효과를 가진다.Recent development of artificial intelligence systems has been driven by various factors such as the development of new algorithms and the the explosive increase in the amount of available data. In the real-world scenarios, individuals or corporations benefit by providing data for training a machine learning model or the trained model. However, it has been revealed that sharing of data or the model can lead to invasion of personal privacy by leaking personal sensitive information. In this dissertation, we focus on developing privacy-preserving machine learning methods which can protect sensitive information. Homomorphic encryption can protect the privacy of data and the models because machine learning algorithms can be applied to encrypted data, but requires much larger computation time than conventional operations. For efficient computation, we take two approaches. The first is to reduce the amount of computation in the training phase. We present an efficient training algorithm by encrypting only few important information. In specific, we develop a ridge regression algorithm that greatly reduces the amount of computation when one or two sensitive variables are encrypted. Furthermore, we extend the method to apply it to classification problems by developing a new logistic regression algorithm that can maximally exclude searching of hyper-parameters that are not suitable for machine learning with homomorphic encryption. Another approach is to apply homomorphic encryption only when the trained model is used for inference, which prevents direct exposure of the test data and the model information. We propose a homomorphic-encryption-friendly algorithm for inference of support based clustering. Though homomorphic encryption can prevent various threats to data and the model information, it cannot defend against secondary attacks through inference APIs. It has been reported that an adversary can extract information about the training data only with his or her input and the corresponding output of the model. For instance, the adversary can determine whether specific data is included in the training data or not. Differential privacy is a mathematical concept which guarantees defense against those attacks by reducing the impact of specific data samples on the trained model. Differential privacy has the advantage of being able to quantitatively express the degree of privacy, but it reduces the utility of the model by adding randomness to the algorithm. Therefore, we propose a novel method which can improve the utility while maintaining the privacy of differentially private clustering algorithms by utilizing Morse theory. The privacy-preserving machine learning methods proposed in this paper can complement each other to prevent different levels of attacks. We expect that our methods can construct an integrated system and be applied to various domains where machine learning involves sensitive personal information.Chapter 1 Introduction 1 1.1 Motivation of the Dissertation 1 1.2 Aims of the Dissertation 7 1.3 Organization of the Dissertation 10 Chapter 2 Preliminaries 11 2.1 Homomorphic Encryption 11 2.2 Differential Privacy 14 Chapter 3 Efficient Homomorphic Encryption Framework for Ridge Regression 18 3.1 Problem Statement 18 3.2 Framework 22 3.3 Proposed Method 25 3.3.1 Regression with one Encrypted Sensitive Variable 25 3.3.2 Regression with two Encrypted Sensitive Variables 30 3.3.3 Adversarial Perturbation Against Attribute Inference Attack 35 3.3.4 Algorithm for Ridge Regression 36 3.3.5 Algorithm for Adversarial Perturbation 37 3.4 Experiments 40 3.4.1 Experimental Setting 40 3.4.2 Experimental Results 42 3.5 Chapter Summary 47 Chapter 4 Parameter-free Homomorphic-encryption-friendly Logistic Regression 53 4.1 Problem Statement 53 4.2 Proposed Method 56 4.2.1 Motivation 56 4.2.2 Framework 58 4.3 Theoretical Results 63 4.4 Experiments 68 4.4.1 Experimental Setting 68 4.4.2 Experimental Results 70 4.5 Chapter Summary 75 Chapter 5 Homomorphic-encryption-friendly Evaluation for Support Vector Clustering 76 5.1 Problem Statement 76 5.2 Background 78 5.2.1 CKKS scheme 78 5.2.2 SVC 80 5.3 Proposed Method 82 5.4 Experiments 86 5.4.1 Experimental Setting 86 5.4.2 Experimental Results 87 5.5 Chapter Summary 89 Chapter 6 Differentially Private Mixture of Gaussians Clustering with Morse Theory 95 6.1 Problem Statement 95 6.2 Background 98 6.2.1 Mixture of Gaussians 98 6.2.2 Morse Theory 99 6.2.3 Dynamical System Perspective 101 6.3 Proposed Method 104 6.3.1 Differentially private clustering 105 6.3.2 Transition equilibrium vectors and the weighted graph 108 6.3.3 Hierarchical merging of sub-clusters 111 6.4 Theoretical Results 112 6.5 Experiments 117 6.5.1 Experimental Setting 117 6.5.2 Experimental Results 119 6.6 Chapter Summary 122 Chapter 7 Conclusion 124 7.1 Conclusion 124 7.2 Future Direction 126 Bibliography 128 국문초록 154박

SNU Open Repository and Archive

A Hybrid Multi-user Cloud Access Control based Block Chain Framework for Privacy Preserving Distributed Databases

Author: Bharati K.F.
Krishna Ch. Nanda
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Most of the traditional medical applications are insecure and difficult to compute the data integrity with variable hash size. Traditional medical data security systems are insecure and it depend on static parameters for data security. Also, distributed based cloud storage systems are independent of integrity computational and data security due to unstructured data and computational memory. As the size of the data and its dimensions are increasing in the public and private cloud servers, it is difficult to provide the machine learning based privacy preserving in cloud computing environment. Block-chain technology plays a vital role for large cloud databases. Most of the conventional block-chain frameworks are based on the existing integrity and confidentiality models. Also, these models are based on the data size and file format. In this model, a novel integrity verification and encryption framework is designed and implemented in cloud environment.  In order to overcome these problems in the cloud computing environment, a hybrid integrity and security-based block-chain framework is designed and implemented on the large distributed databases. In this framework,a novel decision tree classifier is used along with non-linear mathematical hash algorithm and advanced attribute-based encryption models are used to improve the privacy of multiple users on the large cloud datasets. Experimental results proved that the proposed advanced privacy preserving based block-chain technology has better efficiency than the traditional block-chain based privacy preserving systems on large distributed databases

International Journal on Recent and Innovation Trends in Computing and Communication