52 research outputs found

    ESCAPED: Efficient Secure and Private Dot Product Framework for Kernel-based Machine Learning Algorithms with Applications in Healthcare

    Full text link
    To train sophisticated machine learning models one usually needs many training samples. Especially in healthcare settings these samples can be very expensive, meaning that one institution alone usually does not have enough on its own. Merging privacy-sensitive data from different sources is usually restricted by data security and data protection measures. This can lead to approaches that reduce data quality by putting noise onto the variables (e.g., in ϵ\epsilon-differential privacy) or omitting certain values (e.g., for kk-anonymity). Other measures based on cryptographic methods can lead to very time-consuming computations, which is especially problematic for larger multi-omics data. We address this problem by introducing ESCAPED, which stands for Efficient SeCure And PrivatE Dot product framework, enabling the computation of the dot product of vectors from multiple sources on a third-party, which later trains kernel-based machine learning algorithms, while neither sacrificing privacy nor adding noise. We evaluated our framework on drug resistance prediction for HIV-infected people and multi-omics dimensionality reduction and clustering problems in precision medicine. In terms of execution time, our framework significantly outperforms the best-fitting existing approaches without sacrificing the performance of the algorithm. Even though we only show the benefit for kernel-based algorithms, our framework can open up new research opportunities for further machine learning models that require the dot product of vectors from multiple sources.Comment: AAAI 2021, Preprint version of the full paper with supplementary materia

    PILOT : Practical Privacy-Preserving Indoor Localization Using OuTsourcing

    Get PDF
    In the last decade, we observed a constantly growing number of Location-Based Services (LBSs) used in indoor environments, such as for targeted advertising in shopping malls or finding nearby friends. Although privacy-preserving LBSs were addressed in the literature, there was a lack of attention to the problem of enhancing privacy of indoor localization, i.e., the process of obtaining the users' locations indoors and, thus, a prerequisite for any indoor LBS. In this work we present PILOT, the first practically efficient solution for Privacy-Preserving Indoor Localization (PPIL) that was obtained by a synergy of the research areas indoor localization and applied cryptography. We design, implement, and evaluate protocols for Wi-Fi fingerprint-based PPIL that rely on 4 different distance metrics. To save energy and network bandwidth for the mobile end devices in PPIL, we securely outsource the computations to two non-colluding semi-honest parties. Our solution mixes different secure two-party computation protocols and we design size-and depth-optimized circuits for PPIL. We construct efficient circuit building blocks that are of independent interest: Single Instruction Multiple Data (SIMD) capable oblivious access to an array with low circuit depth and selection of the k-Nearest Neighbors with small circuit size. Additionally, we reduce Received Signal Strength (RSS) values from 8 bits to 4 bits without any significant accuracy reduction. Our most efficient PPIL protocol is 553x faster than that of Li et al. (INFOCOM'14) and 500× faster than that of Ziegeldorf et al. (WiSec'14). Our implementation on commodity hardware has practical run-times of less than 1 second even for the most accurate distance metrics that we consider, and it can process more than half a million PPIL queries per day.Peer reviewe

    Can Our Health Data Stay Private? A Review and Future Directions for IS Research on Privacy-Preserving AI in Healthcare

    Get PDF
    The generation of data has become one of the main drivers of modern healthcare. Like other industries, we see that the total amount of healthcare data is growing and in diversity. Thus, Artificial Intelligence (AI) is being used increasingly as a tool to turn this body of healthcare data into real value. But with AI and big data comes big risk, especially in terms of data privacy. Privacy-preserving AI techniques are gaining in popularity to prevent patient privacy compromises while utilizing the potentials offered by AI. However, there is no clear understanding of the current research space of applying such privacy-preserving techniques in healthcare. This paper aims to provide an understanding of these techniques and investigates the emerging research field of privacy-preserving AI and its use in healthcare by reviewing the current multidisciplinary research to synthesize knowledge and derive future research directions in this regard

    LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

    Get PDF
    High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively

    Scalable and Secure Logistic Regression via Homomorphic Encryption

    Get PDF
    Logistic regression is a powerful machine learning tool to classify data. When dealing with sensitive data such as private or medical information, cares are necessary. In this paper, we propose a secure system for protecting both the training and predicting data in logistic regression via homomorphic encryption. Perhaps surprisingly, despite the non-polynomial tasks of training and predicting in logistic regression, we show that only additively homomorphic encryption is needed to build our system. Indeed, we instantiate our system with Paillier, LWE-based, and ring-LWE-based encryption schemes, highlighting the merits and demerits of each instance. Our system is very scalable in both the dataset size and dimension, tolerating big size for example of hundreds of millions (10810^8s) records. Besides examining the costs of computation and communication, we carefully test our system over real datasets to demonstrate its accuracies and other related measures such as F-score and AUC

    A Framework for Uncertain Cloud Data Security and Recovery Based on Hybrid Multi-User Medical Decision Learning Patterns

    Get PDF
    Machine learning has been supporting real-time cloud based medical computing systems. However, most of the computing servers are independent of data security and recovery scheme in multiple virtual machines due to high computing cost and time. Also, this cloud based medical applications require static security parameters for cloud data security. Cloud based medical applications require multiple servers to store medical records or machine learning patterns for decision making. Due to high Uncertain computational memory and time, these cloud systems require an efficient data security framework to provide strong data access control among the multiple users. In this work, a hybrid cloud data security framework is developed to improve the data security on the large machine learning patterns in real-time cloud computing environment. This work is implemented in two phases’ i.e. data replication phase and multi-user data access security phase. Initially, machine decision patterns are replicated among the multiple servers for Uncertain data recovering phase. In the multi-access cloud data security framework, a hybrid multi-access key based data encryption and decryption model is implemented on the large machine learning medical patterns for data recovery and security process. Experimental results proved that the present two-phase data recovering, and security framework has better computational efficiency than the conventional approaches on large medical decision patterns

    Cloud-based homomorphic encryption for privacy-preserving machine learning in clinical decision support

    Get PDF
    While privacy and security concerns dominate public cloud services, Homomorphic Encryption (HE) is seen as an emerging solution that ensures secure processing of sensitive data via untrusted networks in the public cloud or by third-party cloud vendors. It relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data meaningfully while still in encrypted form; although there are major stumbling blocks to overcome before the technology is considered mature for production cloud environments. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. Machine Learning (ML) is employed in CDS applications that typically learn and can personalise actions based on individual behaviour. A relatively simple-to-implement, common and consistent framework is sought that can overcome most limitations of Fully Homomorphic Encryption (FHE) in order to offer an expanded and flexible set of HE capabilities. In the absence of a significant breakthrough in FHE efficiency and practical use, it would appear that a solution relying on client interactions is the best known entity for meeting the requirements of private CDS-based computation, so long as security is not significantly compromised. A hybrid solution is introduced, that intersperses limited two-party interactions amongst the main homomorphic computations, allowing exchange of both numerical and logical cryptographic contexts in addition to resolving other major FHE limitations. Interactions involve the use of client-based ciphertext decryptions blinded by data obfuscation techniques, to maintain privacy. This thesis explores the middle ground whereby HE schemes can provide improved and efficient arbitrary computational functionality over a significantly reduced two-party network interaction model involving data obfuscation techniques. This compromise allows for the powerful capabilities of HE to be leveraged, providing a more uniform, flexible and general approach to privacy-preserving system integration, which is suitable for cloud deployment. The proposed platform is uniquely designed to make HE more practical for mainstream clinical application use, equipped with a rich set of capabilities and potentially very complex depth of HE operations. Such a solution would be suitable for the long-term privacy preserving-processing requirements of a cloud-based CDS system, which would typically require complex combinatorial logic, workflow and ML capabilities
    corecore