500 research outputs found

    Logistic regression model training based on the approximate homomorphic encryption

    Get PDF
    Background: Security concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives. Methods: This paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov's accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier. Results: Our method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable. Conclusions: We present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality

    Encrypted statistical machine learning: new privacy preserving methods

    Full text link
    We present two new statistical machine learning methods designed to learn on fully homomorphic encrypted (FHE) data. The introduction of FHE schemes following Gentry (2009) opens up the prospect of privacy preserving statistical machine learning analysis and modelling of encrypted data without compromising security constraints. We propose tailored algorithms for applying extremely random forests, involving a new cryptographic stochastic fraction estimator, and na\"{i}ve Bayes, involving a semi-parametric model for the class decision boundary, and show how they can be used to learn and predict from encrypted data. We demonstrate that these techniques perform competitively on a variety of classification data sets and provide detailed information about the computational practicalities of these and other FHE methods.Comment: 39 page

    λ™ν˜•μ•”ν˜Έ μž¬λΆ€νŒ… 기법에 κ΄€ν•œ 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : μžμ—°κ³Όν•™λŒ€ν•™ μˆ˜λ¦¬κ³Όν•™λΆ€, 2019. 2. μ²œμ •ν¬.2009λ…„ Gentry에 μ˜ν•΄μ„œ μ™„μ „λ™ν˜•μ•”ν˜Έκ°€ 처음 μ„€κ³„λœ μ΄ν›„λ‘œ μ΅œμ ν™”μ™€ 고속화λ₯Ό μœ„ν•΄μ„œ λ‹€μ–‘ν•œ 기법듀과 μŠ€ν‚΄λ“€μ΄ μ„€κ³„λ˜μ–΄ μ™”λ‹€. ν•˜μ§€λ§Œ λ™ν˜•μ•”ν˜Έμ˜ μ—°μ‚°νšŸμˆ˜λ₯Ό λ¬΄μ œν•œμœΌλ‘œ 늘리기 μœ„ν•΄μ„œ ν•„μˆ˜μ μΈ μž¬λΆ€νŒ… κΈ°λ²•μ˜ νš¨μœ¨μ„± 문제둜 μ‹€μ œ μ‘μš©μ— μ μš©ν•˜κΈ°μ—λŠ” λΆ€μ ν•©ν•˜λ‹€λŠ” 평가λ₯Ό 많이 λ°›μ•„μ™”λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μž¬λΆ€νŒ… κΈ°λ²•μ˜ 고속화λ₯Ό μœ„ν•œ λ‹€μ–‘ν•œ 기법을 μ œμ‹œν•˜κ³  이λ₯Ό μ‹€μ œλ‘œ μ‘μš©λΆ„μ•Όμ— μ μš©ν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λŒ€ν‘œμ μΈ λ™ν˜•μ•”ν˜Έ μŠ€ν‚΄λ“€μ— λŒ€ν•œ μž¬λΆ€νŒ… 기법에 λŒ€ν•œ 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λŠ”λ°, 첫 λ²ˆμ§Έλ‘œλŠ” Microsoft Research와 IMBμ—μ„œ λ§Œλ“  λ™ν˜•μ•”ν˜Έ 라이브러리인 SEALκ³Ό HElib에 μ μš©κ°€λŠ₯ν•œ μž¬λΆ€νŒ… 기법에 λŒ€ν•œ 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. ν•΄λ‹Ή μž¬λΆ€νŒ… κΈ°λ²•μ—μ„œ 핡심적이 과정은 μ•”ν˜Έν™”λœ μƒνƒœμ—μ„œ λ³΅ν˜Έν™” ν•¨μˆ˜λ₯Ό κ³„μ‚°ν•˜λŠ” 뢀뢄이닀. μ•”ν˜Έλœ μƒνƒœμ—μ„œ μ΅œν•˜μœ„ λΉ„νŠΈλ₯Ό μΆ”μΆœν•˜λŠ” μƒˆλ‘œμš΄ 방법을 μ œμ‹œν•˜μ—¬ μž¬λΆ€νŒ… κ³Όμ •μ—μ„œ μ†Œλͺ¨λ˜λŠ” κ³„μ‚°λŸ‰κ³Ό ν‘œν˜„λ˜λŠ” λ‹€ν•­μ‹μ˜ 차수λ₯Ό μ€„μ΄λŠ”λ°μ— μ„±κ³΅ν•˜μ˜€λ‹€. 두 λ²ˆμ§Έλ‘œλŠ”, 비ꡐ적 μ΅œκ·Όμ— 개발된 근사계산 λ™ν˜•μ•”ν˜ΈμΈ HEAAN μŠ€ν‚΄μ˜ μž¬λΆ€νŒ… 기법을 κ°œμ„ ν•˜λŠ” 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. 2018년에 μ‚Όκ°ν•¨μˆ˜λ₯Ό μ΄μš©ν•œ 근사법을 ν†΅ν•΄μ„œ 처음 ν•΄λ‹Ή μŠ€ν‚΄μ— λŒ€ν•œ μž¬λΆ€νŒ… 기법이 μ œμ‹œλ˜μ—ˆλŠ”λ°, λ§Žμ€ 데이터λ₯Ό λ‹΄κ³ μžˆλŠ” μ•”ν˜Έλ¬Έμ— λŒ€ν•΄μ„œλŠ” μ „μ²˜λ¦¬, ν›„μ²˜λ¦¬ 과정이 κ³„μ‚°λŸ‰μ˜ λŒ€λΆ€λΆ„μ„ μ°¨μ§€ν•˜λŠ” λ¬Έμ œκ°€ μžˆμ—ˆλ‹€. ν•΄λ‹Ή 과정듀을 μ—¬λŸ¬ λ‹¨κ³„λ‘œ μž¬κ·€μ μΈ ν•¨μˆ˜λ“€λ‘œ ν‘œν˜„ν•˜μ—¬ κ³„μ‚°λŸ‰μ΄ 데이터 μ‚¬μ΄μ¦ˆμ— λŒ€ν•΄μ„œ 둜그적으둜 μ€„μ΄λŠ” 것에 μ„±κ³΅ν•˜μ˜€λ‹€. μΆ”κ°€λ‘œ, λ‹€λ₯Έ μŠ€ν‚΄λ“€μ— λΉ„ν•΄μ„œ 많이 μ‚¬μš©λ˜μ§€λŠ” μ•Šμ§€λ§Œ, μ •μˆ˜κΈ°λ°˜ λ™ν˜•μ•”ν˜Έλ“€μ— λŒ€ν•΄μ„œλ„ μž¬λΆ€νŒ… 기법을 κ°œμ„ ν•˜λŠ” 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€κ³  κ·Έ κ²°κ³Ό κ³„μ‚°λŸ‰μ„ 둜그적으둜 μ€„μ΄λŠ” 것에 μ„±κ³΅ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž¬λΆ€νŒ… κΈ°λ²•μ˜ ν™œμš©μ„±κ³Ό μ‚¬μš© κ°€λŠ₯성을 보이기 μœ„ν•΄ μ‹€μ œ 데이터 λ³΄μ•ˆμ„ ν•„μš”λ‘œ ν•˜λŠ” κΈ°κ³„ν•™μŠ΅ 뢄야에 μ μš©ν•΄λ³΄μ•˜λ‹€. μ‹€μ œλ‘œ 400,000건의 금육 데이터λ₯Ό μ΄μš©ν•œ νšŒκ·€λΆ„μ„μ„ μ•”ν˜Έν™”λœ 데이터λ₯Ό μ΄μš©ν•΄μ„œ μˆ˜ν–‰ν•˜μ˜€λ‹€. κ·Έ κ²°κ³Ό μ•½ 16μ‹œκ°„ μ•ˆμ— 80\% μ΄μƒμ˜ 정확도와 0.8 μ •λ„μ˜ AUROC 값을 κ°€μ§€λŠ” μœ μ˜λ―Έν•œ 뢄석 λͺ¨λΈμ„ 얻을 수 μžˆμ—ˆλ‹€.After Gentry's blueprint on homomorphic encryption (HE) scheme, various efficient schemes have been suggested. For unlimited number of operations between encrypted data, the bootstrapping process is necessary. There are only few works on bootstrapping procedure because of the complexity and inefficiency of bootstrapping. In this paper, we propose various method and techniques for improved bootstrapping algorithm, and we apply it to logistic regression on large scale encrypted data. The bootstrapping process depends on based homomorphic encryption scheme. For various schemes such as BGV, BFV, HEAAN, and integer-based scheme, we improve bootstrapping algorithm. First, we improved bootstrapping for BGV (HElib) and FV (SEAL) schemes which is implemented by Microsoft Research and IMB respectively. The key process for bootstrapping in those two scheme is extracting lower digits of plaintext in encrypted state. We suggest new polynomial that removes lowest digit of input, and we apply it to bootstrapping with previous method. As a result, both the complexity and the consumed depth are reduced. Second, bootstrapping for multiple data needs homomorphic linear transformation. The complexity of this part is O(n) for number of slot n, and this part becomes a bottleneck when we use large n. We use the structure of linear transformation which is used in bootstrapping, and we decompose the matrix which is corresponding to the transformation. By applying recursive strategy, we reduce the complexity to O(log n). Furthermore, we suggest new bootstrapping method for integer-based HE schemes which are based on approximate greatest common divisor problem. By using digit extraction instead of previous bit-wise approach, the complexity of bootstrapping algorithm reduced from O(poly(lambda)) to O(log^2(lambda)). Our implementation for this process shows 6 seconds which was about 3 minutes. To show that bootstrapping can be used for practical application, we implement logistic regression on encrypted data with large scale. Our target data has 400,000 samples, and each sample has 200 features. Because of the size of the data, direct application of homomorphic encryption scheme is almost impossible. Therefore, we decide the method for encryption to maximize the effect of multi-threading and SIMD operations in HE scheme. As a result, our homomorphic logistic regression takes about 16 hours for the target data. The output model has 0.8 AUROC with about 80% accuracy. Another experiment on MNIST dataset shows correctness of our implementation and method.Abstract 1 Introduction 1.1 Homomorphic Encryption 1.2 Machine Learning on Encrypted Data 1.3 List of Papers 2 Background 2.1 Notation 2.2 Homomorphic Encryption 2.3 Ring Learning with Errors 2.4 Approximate GCD 3 Lower Digit Removal and Improved Bootstrapping 3.1 Basis of BGV and BFV scheme 3.2 Improved Digit Extraction Algorithm 3.3 Bootstrapping for BGV and BFV Scheme 3.3.1 Our modications 3.4 Slim Bootstrapping Algorithm 3.5 Implementation Result 4 Faster Homomorphic DFT and Improved Bootstrapping 4.1 Basis of HEAAN scheme 4.2 Homomorphic DFT 4.2.1 Previous Approach 4.2.2 Our method 4.2.3 Hybrid method 4.2.4 Implementation Result 4.3 Improved Bootstrapping for HEAAN 4.3.1 Linear Transformation in Bootstrapping 4.3.2 Improved CoeToSlot and SlotToCoe 4.3.3 Implementation Result 5 Faster Bootstrapping for FHE over the integers 5.1 Basis of FHE over the integers 5.2 Decryption Function via Digit Extraction 5.2.1 Squashed Decryption Function 5.2.2 Digit extraction Technique 5.2.3 Homomorphic Digit Extraction in FHE over the integers 5.3 Bootstrapping for FHE over the integers 5.3.1 CLT scheme with M Z_t 5.3.2 Homomorphic Operations with M Z_t^a 5.3.3 Homomorphic Digit Extraction for CLT scheme 5.3.4 Our Method on the CLT scheme 5.3.5 Analysis of Proposed Bootstrapping Method 5.4 Implementation Result 6 Logistic Regression on Large Encrypted Data 6.1 Basis of Logistic Regression 6.2 Logistic Regression on Encrypted Data 6.2.1 HE-friendly Logistic Regression Algorithm 6.2.2 HE-Optimized Logistic Regression Algorithm 6.2.3 Further Optimization 6.3 Evaluation 6.3.1 Logistic Regression on Encrypted Financial Dataset 6.3.2 Logistic Regression on Encrypted MNIST Dataset 6.3.3 Discussion 7 Conclusions Abstract (in Korean)Docto

    Homomorphic Encryption for Machine Learning in Medicine and Bioinformatics

    Get PDF
    Machine learning techniques are an excellent tool for the medical community to analyzing large amounts of medical and genomic data. On the other hand, ethical concerns and privacy regulations prevent the free sharing of this data. Encryption methods such as fully homomorphic encryption (FHE) provide a method evaluate over encrypted data. Using FHE, machine learning models such as deep learning, decision trees, and naive Bayes have been implemented for private prediction using medical data. FHE has also been shown to enable secure genomic algorithms, such as paternity testing, and secure application of genome-wide association studies. This survey provides an overview of fully homomorphic encryption and its applications in medicine and bioinformatics. The high-level concepts behind FHE and its history are introduced. Details on current open-source implementations are provided, as is the state of FHE for privacy-preserving techniques in machine learning and bioinformatics and future growth opportunities for FHE

    Privacy-Preserving CNN Training with Transfer Learning

    Full text link
    Privacy-preserving nerual network inference has been well studied while homomorphic CNN training still remains an open challenging task. In this paper, we present a practical solution to implement privacy-preserving CNN training based on mere Homomorphic Encryption (HE) technique. To our best knowledge, this is the first attempt successfully to crack this nut and no work ever before has achieved this goal. Several techniques combine to make it done: (1) with transfer learning, privacy-preserving CNN training can be reduced to homomorphic neural network training, or even multiclass logistic regression (MLR) training; (2) via a faster gradient variant called Quadratic Gradient\texttt{Quadratic Gradient}, an enhanced gradient method for MLR with a state-of-the-art performance in converge speed is applied in this work to achieve high performance; (3) we employ the thought of transformation in mathematics to transform approximating Softmax function in encryption domain to the well-studied approximation of Sigmoid function. A new type of loss function is alongside been developed to complement this change; and (4) we use a simple but flexible matrix-encoding method named Volley Revolver\texttt{Volley Revolver} to manage the data flow in the ciphertexts, which is the key factor to complete the whole homomorphic CNN training. The complete, runnable C++ code to implement our work can be found at: https://github.com/petitioner/HE.CNNtraining. We select REGNET_X_400MF\texttt{REGNET\_X\_400MF} as our pre-train model for using transfer learning. We use the first 128 MNIST training images as training data and the whole MNIST testing dataset as the testing data. The client only needs to upload 6 ciphertexts to the cloud and it takes ∼21\sim 21 mins to perform 2 iterations on a cloud with 64 vCPUs, resulting in a precision of 21.49%21.49\%.Comment: In this work, we initiated to implement privacy-persevering CNN training based on mere HE techniques by presenting a faster HE-friendly algorith
    • …
    corecore