Privacy-preserving data analytics in cloud computing

Abstract

The evolution of digital content and rapid expansion of data sources has raised the need for streamlined monitoring, collection, storage and analysis of massive, heterogeneous data to extract useful knowledge and support decision-making mechanisms. In this context, cloud computing o↵ers extensive, cost-e↵ective and on demand computing resources that improve the quality of services for users and also help service providers (enterprises, governments and individuals). Service providers can avoid the expense of acquiring and maintaining IT resources while migrating data and remotely managing processes including aggregation, monitoring and analysis in cloud servers. However, privacy and security concerns of cloud computing services, especially in storing sensitive data (e.g. personal, healthcare and financial) are major challenges to the adoption of these services. To overcome such barriers, several privacy-preserving techniques have been developed to protect outsourced data in the cloud. Cryptography is a well-known mechanism that can ensure data confidentiality in the cloud. Traditional cryptography techniques have the ability to protect the data through encryption in cloud servers and data owners can retrieve and decrypt data for their processing purposes. However, in this case, cloud users can use the cloud resources for data storage but they cannot take full advantage of cloud-based processing services. This raises the need to develop advanced cryptosystems that can protect data privacy, both while in storage and in processing in the cloud. Homomorphic Encryption (HE) has gained attention recently because it can preserve the privacy of data while it is stored and processed in the cloud servers and data owners can retrieve and decrypt their processed data to their own secure side. Therefore, HE o↵ers an end-to-end security mechanism that is a preferable feature in cloud-based applications. In this thesis, we developed innovative privacy-preserving cloud-based models based on HE cryptosystems. This allowed us to build secure and advanced analytic models in various fields. We began by designing and implementing a secure analytic cloud-based model based on a lightweight HE cryptosystem. We used a private resident cloud entity, called ”privacy manager”, as an intermediate communication server between data owners and public cloud servers. The privacy manager handles analytical tasks that cannot be accomplished by the lightweight HE cryptosystem. This model is convenient for several application domains that require real-time responses. Data owners delegate their processing tasks to the privacy manager, which then helps to automate analysis tasks without the need to interact with data owners. We then developed a comprehensive, secure analytical model based on a Fully Homomorphic Encryption (FHE), that has more computational capability than the lightweight HE. Although FHE can automate analysis tasks and avoid the use of the privacy manager entity, it also leads to massive computational overhead. To overcome this issue, we took the advantage of the massive cloud resources by designing a MapReduce model that massively parallelises HE analytical tasks. Our parallelisation approach significantly speeds up the performance of analysis computations based on FHE. We then considered distributed analytic models where the data is generated from distributed heterogeneous sources such as healthcare and industrial sensors that are attached to people or installed in a distributed-based manner. We developed a secure distributed analytic model by re-designing several analytic algorithms (centroid-based and distribution-based clustering) to adapt them into a secure distributed-based models based on FHE. Our distributed analytic model was developed not only for distributed-based applications, but also it eliminates FHE overhead obstacle by achieving high efficiency in FHE computations. Furthermore, the distributed approach is scalable across three factors: analysis accuracy, execution time and the amount of resources used. This scalability feature enables users to consider the requirements of their analysis tasks based on these factors (e.g. users may have limited resources or time constrains to accomplish their analysis tasks). Finally, we designed and implemented two privacy-preserving real-time cloud-based applications to demonstrate the capabilities of HE cryptosystems, in terms of both efficiency and computational capabilities for applications that require timely and reliable delivery of services. First, we developed a secure cloud-based billing model for a sensor-enabled smart grid infrastructure by using lightweight HE. This model handled billing analysis tasks for individual users in a secure manner without the need to interact with any trusted parties. Second, we built a real-time secure health surveillance model for smarter health communities in the cloud. We developed a secure change detection model based on an exponential smoothing technique to predict future changes in health vital signs based on FHE. Moreover, we built an innovative technique to parallelise FHE computations which significantly reduces computational overhead

    Similar works