206 research outputs found

    Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data

    Full text link
    User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, a data owner can continuously record data generated by distributed users and build various predictive models from the data to improve their operations, services, and revenue. Due to the large size and evolving nature of users data, data owners may rely on public cloud service providers (Cloud) for storage and computation scalability. Exposing sensitive user-generated data and advanced analytic models to Cloud raises privacy concerns. We present a confidential learning framework, SecureBoost, for data owners that want to learn predictive models from aggregated user-generated data but offload the storage and computational burden to Cloud without having to worry about protecting the sensitive data. SecureBoost allows users to submit encrypted or randomly masked data to designated Cloud directly. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to dramatically simplify the design of the proposed confidential boosting protocols, yet still preserve the model quality. A Cryptographic Service Provider (CSP) is used to assist the Cloud's processing, reducing the complexity of the protocol constructions. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. For a boosted model, Cloud learns only the RLCs and the CSP learns only the weights of the RLCs. Finally, the data owner collects the two parts to get the complete model. We conduct extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. Our results show that SecureBoost can efficiently learn high-quality boosting models from protected user-generated data

    Federated Learning with Quantum Secure Aggregation

    Full text link
    This article illustrates a novel Quantum Secure Aggregation (QSA) scheme that is designed to provide highly secure and efficient aggregation of local model parameters for federated learning. The scheme is secure in protecting private model parameters from being disclosed to semi-honest attackers by utilizing quantum bits i.e. qubits to represent model parameters. The proposed security mechanism ensures that any attempts to eavesdrop private model parameters can be immediately detected and stopped. The scheme is also efficient in terms of the low computational complexity of transmitting and aggregating model parameters through entangled qubits. Benefits of the proposed QSA scheme are showcased in a horizontal federated learning setting in which both a centralized and decentralized architectures are taken into account. It was empirically demonstrated that the proposed QSA can be readily applied to aggregate different types of local models including logistic regression (LR), convolutional neural networks (CNN) as well as quantum neural network (QNN), indicating the versatility of the QSA scheme. Performances of global models are improved to various extents with respect to local models obtained by individual participants, while no private model parameters are disclosed to semi-honest adversaries

    λ™ν˜•μ•”ν˜Έ μž¬λΆ€νŒ… 기법에 κ΄€ν•œ 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : μžμ—°κ³Όν•™λŒ€ν•™ μˆ˜λ¦¬κ³Όν•™λΆ€, 2019. 2. μ²œμ •ν¬.2009λ…„ Gentry에 μ˜ν•΄μ„œ μ™„μ „λ™ν˜•μ•”ν˜Έκ°€ 처음 μ„€κ³„λœ μ΄ν›„λ‘œ μ΅œμ ν™”μ™€ 고속화λ₯Ό μœ„ν•΄μ„œ λ‹€μ–‘ν•œ 기법듀과 μŠ€ν‚΄λ“€μ΄ μ„€κ³„λ˜μ–΄ μ™”λ‹€. ν•˜μ§€λ§Œ λ™ν˜•μ•”ν˜Έμ˜ μ—°μ‚°νšŸμˆ˜λ₯Ό λ¬΄μ œν•œμœΌλ‘œ 늘리기 μœ„ν•΄μ„œ ν•„μˆ˜μ μΈ μž¬λΆ€νŒ… κΈ°λ²•μ˜ νš¨μœ¨μ„± 문제둜 μ‹€μ œ μ‘μš©μ— μ μš©ν•˜κΈ°μ—λŠ” λΆ€μ ν•©ν•˜λ‹€λŠ” 평가λ₯Ό 많이 λ°›μ•„μ™”λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μž¬λΆ€νŒ… κΈ°λ²•μ˜ 고속화λ₯Ό μœ„ν•œ λ‹€μ–‘ν•œ 기법을 μ œμ‹œν•˜κ³  이λ₯Ό μ‹€μ œλ‘œ μ‘μš©λΆ„μ•Όμ— μ μš©ν•˜μ˜€λ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λŒ€ν‘œμ μΈ λ™ν˜•μ•”ν˜Έ μŠ€ν‚΄λ“€μ— λŒ€ν•œ μž¬λΆ€νŒ… 기법에 λŒ€ν•œ 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λŠ”λ°, 첫 λ²ˆμ§Έλ‘œλŠ” Microsoft Research와 IMBμ—μ„œ λ§Œλ“  λ™ν˜•μ•”ν˜Έ 라이브러리인 SEALκ³Ό HElib에 μ μš©κ°€λŠ₯ν•œ μž¬λΆ€νŒ… 기법에 λŒ€ν•œ 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. ν•΄λ‹Ή μž¬λΆ€νŒ… κΈ°λ²•μ—μ„œ 핡심적이 과정은 μ•”ν˜Έν™”λœ μƒνƒœμ—μ„œ λ³΅ν˜Έν™” ν•¨μˆ˜λ₯Ό κ³„μ‚°ν•˜λŠ” 뢀뢄이닀. μ•”ν˜Έλœ μƒνƒœμ—μ„œ μ΅œν•˜μœ„ λΉ„νŠΈλ₯Ό μΆ”μΆœν•˜λŠ” μƒˆλ‘œμš΄ 방법을 μ œμ‹œν•˜μ—¬ μž¬λΆ€νŒ… κ³Όμ •μ—μ„œ μ†Œλͺ¨λ˜λŠ” κ³„μ‚°λŸ‰κ³Ό ν‘œν˜„λ˜λŠ” λ‹€ν•­μ‹μ˜ 차수λ₯Ό μ€„μ΄λŠ”λ°μ— μ„±κ³΅ν•˜μ˜€λ‹€. 두 λ²ˆμ§Έλ‘œλŠ”, 비ꡐ적 μ΅œκ·Όμ— 개발된 근사계산 λ™ν˜•μ•”ν˜ΈμΈ HEAAN μŠ€ν‚΄μ˜ μž¬λΆ€νŒ… 기법을 κ°œμ„ ν•˜λŠ” 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€λ‹€. 2018년에 μ‚Όκ°ν•¨μˆ˜λ₯Ό μ΄μš©ν•œ 근사법을 ν†΅ν•΄μ„œ 처음 ν•΄λ‹Ή μŠ€ν‚΄μ— λŒ€ν•œ μž¬λΆ€νŒ… 기법이 μ œμ‹œλ˜μ—ˆλŠ”λ°, λ§Žμ€ 데이터λ₯Ό λ‹΄κ³ μžˆλŠ” μ•”ν˜Έλ¬Έμ— λŒ€ν•΄μ„œλŠ” μ „μ²˜λ¦¬, ν›„μ²˜λ¦¬ 과정이 κ³„μ‚°λŸ‰μ˜ λŒ€λΆ€λΆ„μ„ μ°¨μ§€ν•˜λŠ” λ¬Έμ œκ°€ μžˆμ—ˆλ‹€. ν•΄λ‹Ή 과정듀을 μ—¬λŸ¬ λ‹¨κ³„λ‘œ μž¬κ·€μ μΈ ν•¨μˆ˜λ“€λ‘œ ν‘œν˜„ν•˜μ—¬ κ³„μ‚°λŸ‰μ΄ 데이터 μ‚¬μ΄μ¦ˆμ— λŒ€ν•΄μ„œ 둜그적으둜 μ€„μ΄λŠ” 것에 μ„±κ³΅ν•˜μ˜€λ‹€. μΆ”κ°€λ‘œ, λ‹€λ₯Έ μŠ€ν‚΄λ“€μ— λΉ„ν•΄μ„œ 많이 μ‚¬μš©λ˜μ§€λŠ” μ•Šμ§€λ§Œ, μ •μˆ˜κΈ°λ°˜ λ™ν˜•μ•”ν˜Έλ“€μ— λŒ€ν•΄μ„œλ„ μž¬λΆ€νŒ… 기법을 κ°œμ„ ν•˜λŠ” 연ꡬλ₯Ό μˆ˜ν–‰ν•˜μ˜€κ³  κ·Έ κ²°κ³Ό κ³„μ‚°λŸ‰μ„ 둜그적으둜 μ€„μ΄λŠ” 것에 μ„±κ³΅ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μž¬λΆ€νŒ… κΈ°λ²•μ˜ ν™œμš©μ„±κ³Ό μ‚¬μš© κ°€λŠ₯성을 보이기 μœ„ν•΄ μ‹€μ œ 데이터 λ³΄μ•ˆμ„ ν•„μš”λ‘œ ν•˜λŠ” κΈ°κ³„ν•™μŠ΅ 뢄야에 μ μš©ν•΄λ³΄μ•˜λ‹€. μ‹€μ œλ‘œ 400,000건의 금육 데이터λ₯Ό μ΄μš©ν•œ νšŒκ·€λΆ„μ„μ„ μ•”ν˜Έν™”λœ 데이터λ₯Ό μ΄μš©ν•΄μ„œ μˆ˜ν–‰ν•˜μ˜€λ‹€. κ·Έ κ²°κ³Ό μ•½ 16μ‹œκ°„ μ•ˆμ— 80\% μ΄μƒμ˜ 정확도와 0.8 μ •λ„μ˜ AUROC 값을 κ°€μ§€λŠ” μœ μ˜λ―Έν•œ 뢄석 λͺ¨λΈμ„ 얻을 수 μžˆμ—ˆλ‹€.After Gentry's blueprint on homomorphic encryption (HE) scheme, various efficient schemes have been suggested. For unlimited number of operations between encrypted data, the bootstrapping process is necessary. There are only few works on bootstrapping procedure because of the complexity and inefficiency of bootstrapping. In this paper, we propose various method and techniques for improved bootstrapping algorithm, and we apply it to logistic regression on large scale encrypted data. The bootstrapping process depends on based homomorphic encryption scheme. For various schemes such as BGV, BFV, HEAAN, and integer-based scheme, we improve bootstrapping algorithm. First, we improved bootstrapping for BGV (HElib) and FV (SEAL) schemes which is implemented by Microsoft Research and IMB respectively. The key process for bootstrapping in those two scheme is extracting lower digits of plaintext in encrypted state. We suggest new polynomial that removes lowest digit of input, and we apply it to bootstrapping with previous method. As a result, both the complexity and the consumed depth are reduced. Second, bootstrapping for multiple data needs homomorphic linear transformation. The complexity of this part is O(n) for number of slot n, and this part becomes a bottleneck when we use large n. We use the structure of linear transformation which is used in bootstrapping, and we decompose the matrix which is corresponding to the transformation. By applying recursive strategy, we reduce the complexity to O(log n). Furthermore, we suggest new bootstrapping method for integer-based HE schemes which are based on approximate greatest common divisor problem. By using digit extraction instead of previous bit-wise approach, the complexity of bootstrapping algorithm reduced from O(poly(lambda)) to O(log^2(lambda)). Our implementation for this process shows 6 seconds which was about 3 minutes. To show that bootstrapping can be used for practical application, we implement logistic regression on encrypted data with large scale. Our target data has 400,000 samples, and each sample has 200 features. Because of the size of the data, direct application of homomorphic encryption scheme is almost impossible. Therefore, we decide the method for encryption to maximize the effect of multi-threading and SIMD operations in HE scheme. As a result, our homomorphic logistic regression takes about 16 hours for the target data. The output model has 0.8 AUROC with about 80% accuracy. Another experiment on MNIST dataset shows correctness of our implementation and method.Abstract 1 Introduction 1.1 Homomorphic Encryption 1.2 Machine Learning on Encrypted Data 1.3 List of Papers 2 Background 2.1 Notation 2.2 Homomorphic Encryption 2.3 Ring Learning with Errors 2.4 Approximate GCD 3 Lower Digit Removal and Improved Bootstrapping 3.1 Basis of BGV and BFV scheme 3.2 Improved Digit Extraction Algorithm 3.3 Bootstrapping for BGV and BFV Scheme 3.3.1 Our modications 3.4 Slim Bootstrapping Algorithm 3.5 Implementation Result 4 Faster Homomorphic DFT and Improved Bootstrapping 4.1 Basis of HEAAN scheme 4.2 Homomorphic DFT 4.2.1 Previous Approach 4.2.2 Our method 4.2.3 Hybrid method 4.2.4 Implementation Result 4.3 Improved Bootstrapping for HEAAN 4.3.1 Linear Transformation in Bootstrapping 4.3.2 Improved CoeToSlot and SlotToCoe 4.3.3 Implementation Result 5 Faster Bootstrapping for FHE over the integers 5.1 Basis of FHE over the integers 5.2 Decryption Function via Digit Extraction 5.2.1 Squashed Decryption Function 5.2.2 Digit extraction Technique 5.2.3 Homomorphic Digit Extraction in FHE over the integers 5.3 Bootstrapping for FHE over the integers 5.3.1 CLT scheme with M Z_t 5.3.2 Homomorphic Operations with M Z_t^a 5.3.3 Homomorphic Digit Extraction for CLT scheme 5.3.4 Our Method on the CLT scheme 5.3.5 Analysis of Proposed Bootstrapping Method 5.4 Implementation Result 6 Logistic Regression on Large Encrypted Data 6.1 Basis of Logistic Regression 6.2 Logistic Regression on Encrypted Data 6.2.1 HE-friendly Logistic Regression Algorithm 6.2.2 HE-Optimized Logistic Regression Algorithm 6.2.3 Further Optimization 6.3 Evaluation 6.3.1 Logistic Regression on Encrypted Financial Dataset 6.3.2 Logistic Regression on Encrypted MNIST Dataset 6.3.3 Discussion 7 Conclusions Abstract (in Korean)Docto

    Privacy-preserving machine learning for healthcare: open challenges and future perspectives

    Full text link
    Machine Learning (ML) has recently shown tremendous success in modeling various healthcare prediction tasks, ranging from disease diagnosis and prognosis to patient treatment. Due to the sensitive nature of medical data, privacy must be considered along the entire ML pipeline, from model training to inference. In this paper, we conduct a review of recent literature concerning Privacy-Preserving Machine Learning (PPML) for healthcare. We primarily focus on privacy-preserving training and inference-as-a-service, and perform a comprehensive review of existing trends, identify challenges, and discuss opportunities for future research directions. The aim of this review is to guide the development of private and efficient ML models in healthcare, with the prospects of translating research efforts into real-world settings.Comment: ICLR 2023 Workshop on Trustworthy Machine Learning for Healthcare (TML4H

    Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector

    Get PDF
    Machinelearning(ML)istodaycommonlyemployedintheFinancialServicesSector(FSS) to create various models to predict a variety of conditions ranging from financial transactions fraud to outcomes of investments and also targeted marketing campaigns. The common ML technique used for the modeling is supervised learning using regression algorithms and usually involves large amounts of data that needs to be shared and prepared before the actual learning phase. Compliance with privacy laws and confidentiality regulations requires that most, if not all, of the data must be kept in a secure environment, usually in-house, and not outsourced to cloud or multi-tenant shared environments. This paper presents the results of a research collaboration between IBM Research and Banco Bradesco SA to investigate approaches to homomorphically secure a typical ML pipeline commonly employed in the FSS industry. We investigated and de-constructed a typical ML pipeline used by Banco Bradesco and applied Homo- morphic Encryption (HE) to two of the important ML tasks, namely the variable selection phase of the model generation task and the prediction task. Variable selection, which usually precedes the training phase, is very important when working with data sets for which no prior knowledge of the covariate set exists. Our work provides a way to define an initial covariate set for the training phase while preserving the privacy and confidentiality of the input data sets. Quality metrics, using real financial data, comprising quantitative, qualitative and categorical features, demonstrated that our HE based pipeline can yield results comparable to state of the art variable selection techniques and the performance results demonstrated that HE technology has reached the inflection point where it can be useful in batch processing in a financial business setting

    FederBoost: Private Federated Learning for GBDT

    Full text link
    An emerging trend in machine learning and artificial intelligence is federated learning (FL), which allows multiple participants to contribute various training data to train a better model. It promises to keep the training data local for each participant, leading to low communication complexity and high privacy. However, there are still two problems in FL remain unsolved: (1) unable to handle vertically partitioned data, and (2) unable to support decision trees. Existing FL solutions for vertically partitioned data or decision trees require heavy cryptographic operations. In this paper, we propose a framework named FederBoost for private federated learning of gradient boosting decision trees (GBDT). It supports running GBDT over both horizontally and vertically partitioned data. The key observation for designing FederBoost is that the whole training process of GBDT relies on the order of the data instead of the values. Consequently, vertical FederBoost does not require any cryptographic operation and horizontal FederBoost only requires lightweight secure aggregation. We fully implement FederBoost and evaluate its utility and efficiency through extensive experiments performed on three public datasets. Our experimental results show that both vertical and horizontal FederBoost achieve the same level of AUC with centralized training where all data are collected in a central server; and both of them can finish training within half an hour even in WAN.Comment: 15 pages, 8 figure
    • …
    corecore