25 research outputs found

    Environ Int

    Get PDF
    The field of environmental epidemiology has been using "-omics" technologies, including the exposome, metabolome, and methylome, to understand the potential effects and biological pathways of a number of environmental pollutants. However, the majority of studies have focused on a single disease or phenotype, and have not systematically considered patterns of multimorbidity and whether environmental pollutants have pleiotropic effects. These questions could be addressed by examining the relation between environmental exposures and the phenome - the patterns and profiles of human health that individuals experience from birth to death. By conducting Phenome Wide Association Studies (PheWAS), we can generate new hypotheses about new or poorly understood exposures, identify novel associations for established toxicants, and better understand biological pathways affected by environmental pollutants. In this article, we provide a conceptual framework for conducting PheWAS in environmental epidemiology and summarize some of the advantages and challenges to using the PheWAS to study environmental pollutant exposures. Ultimately, by adding the PheWAS to our "-omics" toolbox, we could substantially improve our understanding of the potential health effects of environmental pollutants.20192020-09-01T00:00:00ZUH3 OD023313/OD/NIH HHS/United StatesR01 ES027408/ES/NIEHS NIH HHS/United StatesUH3 OD023313/CD/ODCDC CDC HHS/United StatesR01 ES025214/ES/NIEHS NIH HHS/United StatesR01 ES024381/ES/NIEHS NIH HHS/United States31200158PMC6682449841

    SemEHR:A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research

    Get PDF
    OBJECTIVE: Unlocking the data contained within both structured and unstructured components of electronic health records (EHRs) has the potential to provide a step change in data available for secondary research use, generation of actionable medical insights, hospital management, and trial recruitment. To achieve this, we implemented SemEHR, an open source semantic search and analytics tool for EHRs. METHODS: SemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualized mentions of a wide range of biomedical concepts within EHRs. Natural language processing annotations are further assembled at the patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data are serviced via ontology-based search and analytics interfaces. RESULTS: SemEHR has been deployed at a number of UK hospitals, including the Clinical Record Interactive Search, an anonymized replica of the EHR of the UK South London and Maudsley National Health Service Foundation Trust, one of Europe's largest providers of mental health services. In 2 Clinical Record Interactive Search-based studies, SemEHR achieved 93% (hepatitis C) and 99% (HIV) F-measure results in identifying true positive patients. At King's College Hospital in London, as part of the CogStack program (github.com/cogstack), SemEHR is being used to recruit patients into the UK Department of Health 100 000 Genomes Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast at searching phenotypes; time for recruitment criteria checking was reduced from days to minutes. Validated on open intensive care EHR data, Medical Information Mart for Intensive Care III, the vital signs extracted by SemEHR can achieve around 97% accuracy. CONCLUSION: Results from the multiple case studies demonstrate SemEHR's efficiency: weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of patients, bringing in more and unexpected insight compared to study-oriented bespoke IE systems. SemEHR is open source, available at https://github.com/CogStack/SemEHR

    Autosomal recessive LRP1-related syndrome featuring cardiopulmonary dysfunction, bone dysmorphology, and corneal clouding.

    Get PDF
    We provide the first study of two siblings with a novel autosomal recessive LRP1-related syndrome identified by rapid genome sequencing and overlapping multiple genetic models. The patients presented with respiratory distress, congenital heart defects, hypotonia, dysmorphology, and unique findings, including corneal clouding and ascites. Both siblings had compound heterozygous damaging variants, c.11420G \u3e C (p.Cys3807Ser) and c.12407T \u3e G (p.Val4136Gly) i

    An exploratory phenome wide association study linking asthma and liver disease genetic variants to electronic health records from the Estonian Biobank

    Get PDF
    <div><p>The Estonian Biobank, governed by the Institute of Genomics at the University of Tartu (Biobank), has stored genetic material/DNA and continuously collected data since 2002 on a total of 52,274 individuals representing ~5% of the Estonian adult population and is increasing. To explore the utility of data available in the Biobank, we conducted a phenome-wide association study (PheWAS) in two areas of interest to healthcare researchers; asthma and liver disease. We used 11 asthma and 13 liver disease-associated single nucleotide polymorphisms (SNPs), identified from published genome-wide association studies, to test our ability to detect established associations. We confirmed 2 asthma and 5 liver disease associated variants at nominal significance and directionally consistent with published results. We found 2 associations that were opposite to what was published before (rs4374383:AA increases risk of NASH/NAFLD, rs11597086 increases ALT level). Three SNP-diagnosis pairs passed the phenome-wide significance threshold: rs9273349 and E06 (thyroiditis, p = 5.50x10<sup>-8</sup>); rs9273349 and E10 (type-1 diabetes, p = 2.60x10<sup>-7</sup>); and rs2281135 and K76 (non-alcoholic liver diseases, including NAFLD, p = 4.10x10<sup>-7</sup>). We have validated our approach and confirmed the quality of the data for these conditions. Importantly, we demonstrate that the extensive amount of genetic and medical information from the Estonian Biobank can be successfully utilized for scientific research.</p></div

    Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records

    Get PDF
    The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to signs and symptoms in an ontology. Extracting signs and symptoms from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators but one that was lower than human inter-rater agreement. We conclude that high levels of agreement between human annotators are possible with appropriate training and annotation tools. Furthermore, more training examples combined with improvements in neural networks and natural language processing should make machine annotators capable of high throughput automated clinical concept extraction with high levels of agreement with human annotators

    Computational and Statistical Approaches for Large-Scale Genome-Wide Association Studies

    Full text link
    Over the past decade, genome-wide association studies (GWAS) have proven successful at shedding light on the underlying genetic variations that affect the risk of human complex diseases, which can be translated to novel preventative and therapeutic strategies. My research aims at identifying novel disease-associated genetic variants through large-scale GWAS and developing computational and statistical pipelines and methods to improve power and accuracy of GWAS. Bicuspid aortic valve (BAV) is a congenital heart defect characterized by fusion of two of the normal three leaflets of the aortic valve. As the most common cardiovascular malformation in humans, BAV is moderately heritable and is an important risk factor for valvulopathy and aortopathy, but its genetic origins remain elusive. In Chapter 2, we present the first large-scale GWAS study to identify novel genetic variants associated with BAV. We report association with a non-coding variant 151kb from the gene encoding the cardiac-specific transcription factor, GATA4, and near-significance for p.Ser377Gly in GATA4. We used multiple bioinformatics approaches to demonstrate that the GATA4 gene is a plausible biological candidate. In the subsequent functional follow-up, GATA4 was interrupted by CRISPR-Cas9 in induced pluripotent stem cells from healthy donors. The disruption of GATA4 significantly impaired the transition from endothelial cells into mesenchymal cells, a critical step in heart valve development. Genotype imputation is widely used in GWAS to perform in silico genotyping, leading to higher power to identify novel genetic signals. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association tests. In Chapter 3, we compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels – 1000 Genomes Phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the HUNT study with low-pass genome sequencing. We observed that the overall imputation accuracy from the population-specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15-fold larger. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We propose that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of testing only the version of each genetic variant with the highest imputation quality metric. In phenome-wide GWAS by large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, linear mixed model and the recently proposed logistic mixed model, perform poorly -- producing large type I error rates -- in the analysis of phenotypes with unbalanced case-control ratios. In Chapter 4, we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation (SPA) to calibrate the distribution of score test statistics. This method, SAIGE, provides accurate p-values even when case-control ratios are extremely unbalanced. It utilizes state-of-art optimization strategies to reduce computational time and memory cost of generalized mixed model. The computation cost linearly depends on sample size, and hence can be applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 white British European-ancestry samples for 1,403 dichotomous phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144097/1/zhowei_1.pd

    Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes

    Get PDF
    Reverse genetics methods, particularly the production of gene knockouts and knockins, have revolutionized the understanding of gene function. High throughput sequencing now makes it practical to exploit reverse genetics to simultaneously study functions of thousands of normal sequence variants and spontaneous mutations that segregate in intercross and backcross progeny generated by mating completely sequenced parental lines. To evaluate this new reverse genetic method we resequenced the genome of one of the oldest inbred strains of mice—DBA/2J—the father of the large family of BXD recombinant inbred strains. We analyzed ~100X wholegenome sequence data for the DBA/2J strain, relative to C57BL/6J, the reference strain for all mouse genomics and the mother of the BXD family. We generated the most detailed picture of molecular variation between the two mouse strains to date and identified 5.4 million sequence polymorphisms, including, 4.46 million single nucleotide polymorphisms (SNPs), 0.94 million intersections/deletions (indels), and 20,000 structural variants. We systematically scanned massive databases of molecular phenotypes and ~4,000 classical phenotypes to detect linked functional consequences of sequence variants. In majority of cases we successfully recovered known genotype-to-phenotype associations and in several cases we linked sequence variants to novel phenotypes (Ahr, Fh1, Entpd2, and Col6a5). However, our most striking and consistent finding is that apparently deleterious homozygous SNPs, indels, and structural variants have undetectable or very modest additive effects on phenotypes
    corecore