2,304 research outputs found

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward

    Predicting Disease Progression Using Deep Recurrent Neural Networks and Longitudinal Electronic Health Record Data

    Get PDF
    Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross- sectional nature of training and prediction processes. Finding temporal patterns in EHR is especially important as it encodes temporal concepts such as event trends, episodes, cycles, and abnormalities. Previously, there have been attempts to utilize temporal neural network models to predict clinical intervention time and mortality in the intensive care unit (ICU) and recurrent neural network (RNN) models to predict multiple types of medical conditions as well as medication use. However, such work has been limited in scope and generalizability beyond the immediate use cases that have been focused upon. In order to extend the relevant knowledge- base, this study demonstrates a predictive modeling pipeline that can extract and integrate clinical information from the EHR, construct a feature set, and apply a deep recurrent neural network (DRNN) to model complex time stamped longitudinal data for monitoring and managing the progression of a disease condition. It utilizes longitudinal data of pediatric patient cohort diagnosed with Neurofibromatosis Type 1 (NF1), which is one of the most common neurogenetic disorders and occurs in 1 of every 3,000 births, without predilection for race, sex, or ethnicity. The prediction pipeline is differentiable from other efforts to-date that have sought to model NF1 progression in that it involves the analysis of multi-dimensional phenotypes wherein the DRNN is able to model complex non-linear relationships between event points in the longitudinal data both temporally and . Such an approach is critical when seeking to transition from traditional evidence-based care models to precision medicine paradigms. Furthermore, our predictive modeling pipeline can be generalized and applied to manage the progression and stratify the risks in other similar complex diseases, as it can predict multiple set of sub-phenotypical features from training on longitudinal event sequences

    Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes

    Get PDF
    Reverse genetics methods, particularly the production of gene knockouts and knockins, have revolutionized the understanding of gene function. High throughput sequencing now makes it practical to exploit reverse genetics to simultaneously study functions of thousands of normal sequence variants and spontaneous mutations that segregate in intercross and backcross progeny generated by mating completely sequenced parental lines. To evaluate this new reverse genetic method we resequenced the genome of one of the oldest inbred strains of mice—DBA/2J—the father of the large family of BXD recombinant inbred strains. We analyzed ~100X wholegenome sequence data for the DBA/2J strain, relative to C57BL/6J, the reference strain for all mouse genomics and the mother of the BXD family. We generated the most detailed picture of molecular variation between the two mouse strains to date and identified 5.4 million sequence polymorphisms, including, 4.46 million single nucleotide polymorphisms (SNPs), 0.94 million intersections/deletions (indels), and 20,000 structural variants. We systematically scanned massive databases of molecular phenotypes and ~4,000 classical phenotypes to detect linked functional consequences of sequence variants. In majority of cases we successfully recovered known genotype-to-phenotype associations and in several cases we linked sequence variants to novel phenotypes (Ahr, Fh1, Entpd2, and Col6a5). However, our most striking and consistent finding is that apparently deleterious homozygous SNPs, indels, and structural variants have undetectable or very modest additive effects on phenotypes

    Estimating marginal healthcare costs using genetic variants as instrumental variables:Mendelian Randomization in economic evaluation

    Get PDF
    Accurate measurement of the marginal healthcare costs associated with different diseases and health conditions is important, especially for increasingly prevalent conditions such as obesity. However, existing observational study designs cannot identify the causal impact of disease on healthcare costs. This paper explores the possibilities for causal inference offered by Mendelian randomization, a form of instrumental variable analysis that uses genetic variation as a proxy for modifiable risk exposures, to estimate the effect of health conditions on cost. Well-conducted genome-wide association studies provide robust evidence of the associations of genetic variants with health conditions or disease risk factors. The subsequent causal effects of these health conditions on cost can be estimated using genetic variants as instruments for the health conditions. This is because the approximately random allocation of genotypes at conception means that many genetic variants are orthogonal to observable and unobservable confounders. Datasets with linked genotypic and resource use information obtained from electronic medical records or from routinely collected administrative data are now becoming available and will facilitate this form of analysis. We describe some of the methodological issues that arise in this type of analysis, which we illustrate by considering how Mendelian randomization could be used to estimate the causal impact of obesity, a complex trait, on healthcare costs. We describe some of the data sources that could be used for this type of analysis. We conclude by considering the challenges and opportunities offered by Mendelian randomization for economic evaluation

    The Application of Genetic Risk Scores in Rheumatic Diseases: A Perspective

    Get PDF
    \ua9 2023 by the authors.Modest effect sizes have limited the clinical applicability of genetic associations with rheumatic diseases. Genetic risk scores (GRSs) have emerged as a promising solution to translate genetics into useful tools. In this review, we provide an overview of the recent literature on GRSs in rheumatic diseases. We describe six categories for which GRSs are used: (a) disease (outcome) prediction, (b) genetic commonalities between diseases, (c) disease differentiation, (d) interplay between genetics and environmental factors, (e) heritability and transferability, and (f) detecting causal relationships between traits. In our review of the literature, we identified current lacunas and opportunities for future work. First, the shortage of non-European genetic data restricts the application of many GRSs to European populations. Next, many GRSs are tested in settings enriched for cases that limit the transferability to real life. If intended for clinical application, GRSs are ideally tested in the relevant setting. Finally, there is much to elucidate regarding the co-occurrence of clinical traits to identify shared causal paths and elucidate relationships between the diseases. GRSs are useful instruments for this. Overall, the ever-continuing research on GRSs gives a hopeful outlook into the future of GRSs and indicates significant progress in their potential applications
    • 

    corecore