94 research outputs found
Using Fully Homomorphic Encryption for Statistical Analysis of Categorical, Ordinal and Numerical Data
In recent years, there has been a growing trend towards outsourcing of computational tasks with the development of cloud services. The Gentry’s pioneering work of fully homomorphic encryption (FHE) and successive works have opened a new vista for secure and practical cloud computing. In this paper, we consider performing statistical analysis on encrypted data. To improve the efficiency of the computations, we take advantage of the batched computation based on the Chinese-Remainder-Theorem. We propose two building blocks that work with FHE: a novel batch greater-than primitive, and matrix primitive for encrypted matrices. With these building blocks, we construct secure procedures and protocols for different types of statistics including the histogram (count), contingency table (with cell suppression) for categorical data; k-percentile for ordinal data; and principal component analysis and linear regression for numerical data. To demonstrate the effectiveness of our methods, we ran experiments in five real datasets. For instance, we can compute a contingency table with more than 50 cells from 4000 of data in just 5 minutes, and we can train a linear regression model with more than 40k of data and dimension as high as 6 within 15 minutes. We show that the FHE is not as slow as commonly believed and it becomes feasible to perform a broad range of statistical analysis on thousands of encrypted data
Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data
User-generated data is crucial to predictive modeling in many applications.
With a web/mobile/wearable interface, a data owner can continuously record data
generated by distributed users and build various predictive models from the
data to improve their operations, services, and revenue. Due to the large size
and evolving nature of users data, data owners may rely on public cloud service
providers (Cloud) for storage and computation scalability. Exposing sensitive
user-generated data and advanced analytic models to Cloud raises privacy
concerns. We present a confidential learning framework, SecureBoost, for data
owners that want to learn predictive models from aggregated user-generated data
but offload the storage and computational burden to Cloud without having to
worry about protecting the sensitive data. SecureBoost allows users to submit
encrypted or randomly masked data to designated Cloud directly. Our framework
utilizes random linear classifiers (RLCs) as the base classifiers in the
boosting framework to dramatically simplify the design of the proposed
confidential boosting protocols, yet still preserve the model quality. A
Cryptographic Service Provider (CSP) is used to assist the Cloud's processing,
reducing the complexity of the protocol constructions. We present two
constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of
homomorphic encryption, garbled circuits, and random masking to achieve both
security and efficiency. For a boosted model, Cloud learns only the RLCs and
the CSP learns only the weights of the RLCs. Finally, the data owner collects
the two parts to get the complete model. We conduct extensive experiments to
understand the quality of the RLC-based boosting and the cost distribution of
the constructions. Our results show that SecureBoost can efficiently learn
high-quality boosting models from protected user-generated data
- …