10 research outputs found

    Design and development of learning model for compression and processing of deoxyribonucleic acid genome sequence

    Get PDF
    Owing to the substantial volume of human genome sequence data files (from 30-200 GB exposed) Genomic data compression has received considerable traction and storage costs are one of the major problems faced by genomics laboratories. This involves a modern technology of data compression that reduces not only the storage but also the reliability of the operation. There were few attempts to solve this problem independently of both hardware and software. A systematic analysis of associations between genes provides techniques for the recognition of operative connections among genes and their respective yields, as well as understandings into essential biological events that are most important for knowing health and disease phenotypes. This research proposes a reliable and efficient deep learning system for learning embedded projections to combine gene interactions and gene expression in prediction comparison of deep embeddings to strong baselines. In this paper we preform data processing operations and predict gene function, along with gene ontology reconstruction and predict the gene interaction. The three major steps of genomic data compression are extraction of data, storage of data, and retrieval of the data. Hence, we propose a deep learning based on computational optimization techniques which will be efficient in all the three stages of data compression

    Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective

    Full text link
    Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy expertsComment: To appear in the Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2019, Issue

    On Secure Cloud Computing for Genomic Data: From Storage to Analysis

    Get PDF
    Although privacy is generally considered to be the right of an individual or group to control information about themselves, such a right has become challenging to protect in the digital era, this is exemplified by the case of cloud-based genomic computing. Despite the rapid progress in understanding, producing, and using genomic information, the practice of genomic data protection remains a fairly underdeveloped area. One of the indisputable reasons is that most nonexpert individuals do not realize the sensitive nature of their genomic data, unless it has been used against them. Many commercial organizations take advantage of their customers by taking control of personal genomic information, if customers want to benefit from services such as genetic analysis; even worse, these organizations often do not enforce proper protection, which could result in embarrassing data breaches. In this thesis, we investigate the potential threats of cloud- based genomic computing systems and propose various countermeasures by taking into account the functionality requirement. We begin with the most basic system where only symmetric encryption is needed for the cloud storage of genomic data, and we propose a new solution that protects the data against brute-force attacks that threaten the security of password-based encryption in direct-to-consumer companies. The solution employs honey encryption, where plaintext messages need to be transformed to a different space with uniform distribution on elements. We present a novel distribution-transformation encoder. We provide formal security proof of our solution. We analyze the scenario where efficient searching on encrypted data is necessary. We propose a system that provides fast retrieval on encrypted compressed data and that enables individuals to authorize access to fine-grained regions during data retrieval. Our solution addresses three critical dimensions in platforms that use large genomic data: encryption, compression, and efficient data retrieval. Compared with a previous de facto standard solution for storing aligned genomic data, our solution uses 18% less storage. To enable complicated data analysis, we focus on a proposal for secure quality-control of genomic data by using secure multi-party computation based on garbled circuits. Our proposal is for aggregated genomic data sharing, where researchers want to collaborate to perform large-scale genome-wide association studies in order to identify significant genetic variants for certain diseases. Data quality control is the very first stage of such a collaboration and remains a driving factor for further steps. We investigate the feasibility of advanced cryptographic techniques in the data protection of this phase. We demonstrate that for certain protocols, our solution is efficient and scalable. With the advent of precision medicine based on genomic data, the future of big data has become clearly inseparable from cloud-based genomic computing. It is important to continuously re-evaluate the standards of cloud-based genomic computing as novel technologies are developed, security threats arise, and more complex genomic analyses become possible. This is not only a battle against cyber criminals, but also against rigid and ignorant practices. Integrative solutions that carefully consider the use and misuse of personal genomic data are essential for ensuring secure, effective storage and maximizing utility in treating and preventing disease

    Application du chiffrement homomorphe à la protection de la vie privée

    Get PDF
    Le chiffrement homomorphe est une primitive cryptographique permettant de réaliser des opérations arithmétiques sur des données chiffrées. Grâce à celui-ci il est possible de confier des calculs à un agent externe, sans que les données traitées ou les résultats obtenus ne soient accessibles à cet agent. Ainsi il est possible de réaliser des protocoles améliorant la protection de la vie privée dans de nombreux domaines. La découverte de schémas de chiffrement homomorphe relativement efficaces, en général fondés sur les réseaux euclidiens, permet d'envisager des applications intéressantes à un coût calculatoire acceptable. Cette thèse s'attelle dans un premier temps à étudier les principales implémentations rendues publiques de ces schémas, notamment en comparant leur performance. L'étude des librairies FV-NFLlib, SEAL et HElib permet de choisir celle donnant des résultats optimaux en fonction de la situation. Un domaine d'application primordial de la protection de la vie privée est celui des données génomiques. Il est crucial de protéger ces données puisqu'elles sont liées aux individus et soulèvent des problèmes de vie privée qui sont difficiles à contourner. Nous proposons un protocole permettant d'externaliser les données génomiques chiffrées vers un serveur tout en ayant la possibilité de tester la présence d'un élément (une mutation) et ce de manière privée. Le protocole est basé sur un retrait d'information privée et est efficace d'un point de vue consommation mémoire et performance temporelle. Les techniques utilisées permettent une protection de la vie privée optimale. D'autres opérations sont importantes sur les données génomiques, notamment l'étude de la corrélation entre certaines données. Nous montrons la possibilité de réaliser une solution permettant de calculer un ensemble maximal de manière privée entre un élément et une base de données. La profondeur multiplicative du protocole est trop importante pour utiliser les schémas classiques, c'est pourquoi nous avons exploité TFHE. Pour finir, nous avons mis en place un algorithme permettant de détecter une fraude lors d'un paiement en ligne en utilisant le calcul homomorphe. Cet algorithme permet de prédire si un paiement est potentiellement frauduleux sans apprendre les informations de ce paiement

    Analyzing the Privacy and Societal Challenges Stemming from the Rise of Personal Genomic Testing

    Get PDF
    Progress in genomics is enabling researchers to better understand the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. At the same time, the rapid cost drop of genome sequencing has enabled the emergence of a booming market for direct-to-consumer (DTC) genetic testing. Nowadays, companies like 23andMe and AncestryDNA provide affordable health, genealogy, and ancestry reports, and have already tested tens of millions of customers. How- ever, while this technology has the potential to transform society by improving people’s lives, it also harbors dangers as it prompts important privacy and societal concerns. In this thesis, we shed light on these issues using a mixed-methods approach. We start by conducting a technical investigation of the limitations on privacy-enhancing technologies used for testing, storing, and sharing genomic data. We rely on a structured methodology to contextualize and provide a critical analysis of the current state-of-the-art and we identify and discuss ten open problems faced by the community. We then focus on the societal aspects of DTC genetic testing by conducting two large-scale analyses of the genetic testing discourse focusing on both mainstream and fringe social networks, specifically, Twitter, Reddit, and 4chan. Our analyses show that DTC genetic testing is a popular topic of discussion on all platforms. However, these discussions often include highly toxic language expressed through hateful and racist comments and openly antisemitic rhetoric, often conveyed through memes. Overall, our findings highlight that the rise in popularity of this new technology is accompanied by several societal implications that are unlikely to be addressed by only one research field and rather require a multi-disciplinary approach

    An Analysis of Next Generation Sequencing in Mendelian Disorders: Diagnostic Potential and Clinical Utility

    Full text link
    Mendelian disorders are rare, heritable conditions that cause medical, financial, psychological, and social burdens. Diagnosis is key for improving patient care but is challenging due to frequent genetic and phenotypic heterogeneity. The genome-wide next generation sequencing (NGS) technologies of whole exome sequencing (WES) and whole genome sequencing (WGS) have revolutionised Mendelian disorder diagnosis in little more than a decade. Yet, the rapid adoption and complexity of genomic technologies have resulted in a gap between implementation and understanding how to utilise their full potential. The main objective of this research was to establish the benefits of NGS to Mendelian disorder diagnosis for clinical translation. To achieve this, the diagnostic potential and utility of WES and WGS were assessed in a cohort of individuals with suspected, but undiagnosed Mendelian disorders. Both WES and WGS were able to dramatically improve diagnostic rates over traditional testing, with diagnostic yields of first-line WES of 52% and WGS, 61%. WES reanalysis with improved pipelines and scientific knowledge demonstrated gains in diagnostic yield at 12 months, and again 2 years later. A WES diagnosis of PLOD3-related disease, an ultra-rare Mendelian disorder, was extended with phenotype delineation, tissue expression, and protein modelling. The resulting disease description and proposed classification is expected to positively impact future patient diagnosis. To assess the economic implications of genomic testing pathways, comparisons of WES to traditional testing and to WGS were made. The early application of WES in intellectual disability saved AU782foreachadditionaldiagnosiscomparedtothetraditionalmodel.WhileWGSwasshowntohaveagreaterdiagnosticyieldthanWES,whenusedasafirstlinetesteachincrementaladditionalWGSdiagnosiscostAUD782 for each additional diagnosis compared to the traditional model. While WGS was shown to have a greater diagnostic yield than WES, when used as a first-line test each incremental additional WGS diagnosis cost AUD29,000. Thus, from an economic perspective, first-line WES is preferred for routine testing, with WGS reserved for situations where a diagnosis would have a high chance of clinical intervention. Notwithstanding this, WGS additional diagnostic gains should not be underestimated given the increased potential for future diagnosis and the gap in understanding the potential downstream costs benefits of a diagnosis, which may dwarf the initial test cost. In conclusion, NGS technologies provide significant gains over traditional methods and should be adopted early in Mendelian disorder diagnosis to positively impact patient care

    Privacy-Enhancing Technologies for Medical and Genomic Data: From Theory to Practice

    Get PDF
    The impressive technological advances in genomic analysis and the significant drop in the cost of genome sequencing are paving the way to a variety of revolutionary applications in modern healthcare. In particular, the increasing understanding of the human genome, and of its relation to diseases, health and to responses to treatments brings promise of improvements in better preventive and personalized medicine. Unfortunately, the impact on privacy and security is unprecedented. The genome is our ultimate identifier and, if leaked, it can unveil sensitive and personal information such as our genetic diseases, our propensity to develop certain conditions (e.g., cancer or Alzheimer's) or the health issues of our family. Even though legislation, such as the EU General Data Protection Regulation (GDPR) or the US Health Insurance Portability and Accountability Act (HIPAA), aims at mitigating abuses based on genomic and medical data, it is clear that this information also needs to be protected by technical means. In this thesis, we investigate the problem of developing new and practical privacy-enhancing technologies (PETs) for the protection of medical and genomic data. Our goal is to accelerate the adoption of PETs in the medical field in order to address the privacy and security concerns that prevent personalized medicine from reaching its full potential. We focus on two main areas of personalized medicine: clinical care and medical research. For clinical care, we first propose a system for securely storing and selectively retrieving raw genomic data that is indispensable for in-depth diagnoses and treatments of complex genetic diseases such as cancer. Then, we focus on genetic variants and devise a new model based on additively-homomorphic encryption for privacy-preserving genetic testing in clinics. Our model, implemented in the context of HIV treatment, is the first to be tested and evaluated by practitioners in a real operational setting. For medical research, we first propose a method that combines somewhat-homomorphic encryption with differential privacy to enable secure feasibility studies on genetic data stored at an untrusted central repository. Second, we address the problem of sharing genomic and medical data when the data is distributed across multiple mistrustful institutions. We begin by analyzing the risks that threaten patientsâ privacy in systems for the discovery of genetic variants, and we propose practical mitigations to the re-identification risk. Then, for clinical sites to be able to share the data without worrying about the risk of data breaches, we develop a new system based on collective homomorphic encryption: it achieves trust decentralization and enables researchers to securely find eligible patients for clinical studies. Finally, we design a new framework, complementary to the previous ones, for quantifying the risk of unintended disclosure caused by potential inference attacks that are jointly combined by a malicious adversary, when exact genomic data is shared. In summary, in this thesis we demonstrate that PETs, still believed unpractical and immature, can be made practical and can become real enablers for overcoming the privacy and security concerns blocking the advancement of personalized medicine. Addressing privacy issues in healthcare remains a great challenge that will increasingly require long-term collaboration among geneticists, healthcare providers, ethicists, lawmakers, and computer scientists

    Towards Practical Privacy-Preserving Protocols

    Get PDF
    Protecting users' privacy in digital systems becomes more complex and challenging over time, as the amount of stored and exchanged data grows steadily and systems become increasingly involved and connected. Two techniques that try to approach this issue are Secure Multi-Party Computation (MPC) and Private Information Retrieval (PIR), which aim to enable practical computation while simultaneously keeping sensitive data private. In this thesis we present results showing how real-world applications can be executed in a privacy-preserving way. This is not only desired by users of such applications, but since 2018 also based on a strong legal foundation with the General Data Protection Regulation (GDPR) in the European Union, that forces companies to protect the privacy of user data by design. This thesis' contributions are split into three parts and can be summarized as follows: MPC Tools Generic MPC requires in-depth background knowledge about a complex research field. To approach this, we provide tools that are efficient and usable at the same time, and serve as a foundation for follow-up work as they allow cryptographers, researchers and developers to implement, test and deploy MPC applications. We provide an implementation framework that abstracts from the underlying protocols, optimized building blocks generated from hardware synthesis tools, and allow the direct processing of Hardware Definition Languages (HDLs). Finally, we present an automated compiler for efficient hybrid protocols from ANSI C. MPC Applications MPC was for a long time deemed too expensive to be used in practice. We show several use cases of real-world applications that can operate in a privacy-preserving, yet practical way when engineered properly and built on top of suitable MPC protocols. Use cases presented in this thesis are from the domain of route computation using BGP on the Internet or at Internet Exchange Points (IXPs). In both cases our protocols protect sensitive business information that is used to determine routing decisions. Another use case focuses on genomics, which is particularly critical as the human genome is connected to everyone during their entire lifespan and cannot be altered. Our system enables federated genomic databases, where several institutions can privately outsource their genome data and where research institutes can query this data in a privacy-preserving manner. PIR and Applications Privately retrieving data from a database is a crucial requirement for user privacy and metadata protection, and is enabled amongst others by a technique called Private Information Retrieval (PIR). We present improvements and a generalization of a well-known multi-server PIR scheme of Chor et al., and an implementation and evaluation thereof. We also design and implement an efficient anonymous messaging system built on top of PIR. Furthermore we provide a scalable solution for private contact discovery that utilizes ideas from efficient two-server PIR built from Distributed Point Functions (DPFs) in combination with Private Set Intersection (PSI)

    Efficient and secure outsourcing of genomic data storage

    Get PDF
    From iDASH Privacy and Security Workshop 2016International audienceCloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing of sensitive data, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection from untrusted cloud providers and still enables researchers to obtain useful information. We present a novel privacy-preserving algorithm for fully outsourcing the storage of large genomic data files to a public cloud and enable researchers to efficiently search for variants of interest. To preserve data and query confidentiality from possible leakage, our solution exploits optimal encoding for genomic variants and combines it with homomorphic encryption and private information retrieval. The proposed algorithm is implemented in C++ and evaluated on real data as part of the 2016 iDash genome privacy-protection challenge. Results show that our solution outperforms the state-of-the-art and enables researchers to search over millions of encrypted variants in a few seconds. As opposed to prior beliefs that sophisticated privacy-enhancing technologies (PETs) are unpractical for real operational settings, our solution demonstrates that, in the case of genomic data, PETs can represent very efficient enablers
    corecore