4 research outputs found
Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector
Machinelearning(ML)istodaycommonlyemployedintheFinancialServicesSector(FSS) to create various models to predict a variety of conditions ranging from financial transactions fraud to outcomes of investments and also targeted marketing campaigns. The common ML technique used for the modeling is supervised learning using regression algorithms and usually involves large amounts of data that needs to be shared and prepared before the actual learning phase. Compliance with privacy laws and confidentiality regulations requires that most, if not all, of the data must be kept in a secure environment, usually in-house, and not outsourced to cloud or multi-tenant shared environments. This paper presents the results of a research collaboration between IBM Research and Banco Bradesco SA to investigate approaches to homomorphically secure a typical ML pipeline commonly employed in the FSS industry.
We investigated and de-constructed a typical ML pipeline used by Banco Bradesco and applied Homo- morphic Encryption (HE) to two of the important ML tasks, namely the variable selection phase of the model generation task and the prediction task. Variable selection, which usually precedes the training phase, is very important when working with data sets for which no prior knowledge of the covariate set exists. Our work provides a way to define an initial covariate set for the training phase while preserving the privacy and confidentiality of the input data sets.
Quality metrics, using real financial data, comprising quantitative, qualitative and categorical features, demonstrated that our HE based pipeline can yield results comparable to state of the art variable selection techniques and the performance results demonstrated that HE technology has reached the inflection point where it can be useful in batch processing in a financial business setting
On the IND-CCA1 Security of FHE Schemes
Fully homomorphic encryption (FHE) is a powerful tool in cryptography that allows one to perform arbitrary computations on encrypted material without having to decrypt it first. There are numerous FHE schemes, all of which are expanded from somewhat homomorphic encryption (SHE) schemes, and some of which are considered viable in practice. However, while these FHE schemes are semantically (IND-CPA) secure, the question of their IND-CCA1 security is much less studied, and we therefore provide an overview of the IND-CCA1 security of all acknowledged FHE schemes in this paper. To give this overview, we grouped the SHE schemes into broad categories based on their similarities and underlying hardness problems. For each category, we show that the SHE schemes are susceptible to either known adaptive key recovery attacks, a natural extension of known attacks, or our proposed attacks. Finally, we discuss the known techniques to achieve IND-CCA1-secure FHE and SHE schemes. We concluded that none of the proposed schemes were IND-CCA1-secure and that the known general constructions all had their shortcomings.publishedVersio
Recommended from our members
FlexFHE: A System for Homomorphically Encrypting DNA and Operating on Encrypted Data Securely in Untrusted Environments
DNA data contains sensitive health information and personally identifiable data. Currently, even if DNA data is stored in encrypted databases, it must be decrypted for health professionals and researchers to analyze, which means that DNA data exists in plaintext on unsecured, untrusted servers and machines during analysis. This thesis describes a complete system for homomorphically encrypting DNA data in a trusted context and then running analytic operations on the encrypted DNA data in an untrusted context, thus allowing healthcare professionals and researchers to run both high volume analytics on many individuals’ sequenced DNA and run complex analytics on a single individual’s sequenced DNA without ever handling plaintext data.
Symmetric encryption is used as a mechanism for controlling which queries are made on the data. The threat model addressed by this system allows an authorized party to run only authorized queries on a genome, while restricting any additional access.
The system implemented achieves substring search, substring search with wildcards representing mutations, and percent match between two nucleotide sequences by converting genomic data into one-hot binary matrixes and encrypting each bit individually using OpenFHE’s LWE Encryption implemented using the CGGI scheme. While runtime for each operation is O(nm), each operation is maximally parallelized using OpenMP, thus allowing for accelerated performance on machines with multiple CPUs without the need for batching