115 research outputs found

    Multimodal LLMs for health grounded in individual-specific data

    Full text link
    Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness

    R/qtl2: Software for Mapping Quantitative Trait Loci with High-Dimensional Data and Multiparent Populations.

    Get PDF
    R/qtl2 is an interactive software environment for mapping quantitative trait loci (QTL) in experimental populations. The R/qtl2 software expands the scope of the widely used R/qtl software package to include multiparent populations derived from more than two founder strains, such as the Collaborative Cross and Diversity Outbred mice, heterogeneous stocks, and MAGIC plant populations. R/qtl2 is designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes, including gene expression and proteomics. R/qtl2 includes the ability to perform genome scans using a linear mixed model to account for population structure, and also includes features to impute SNPs based on founder strain genomes and to carry out association mapping. The R/qtl2 software provides all of the basic features needed for QTL mapping, including graphical displays and summary reports, and it can be extended through the creation of add-on packages. R/qtl2, which is free and open source software written in the R and C++ programming languages, comes with a test framework

    Identification and Functional Validation of the Novel Antimalarial Resistance Locus PF10_0355 in Plasmodium falciparum

    Get PDF
    The Plasmodium falciparum parasite's ability to adapt to environmental pressures, such as the human immune system and antimalarial drugs, makes malaria an enduring burden to public health. Understanding the genetic basis of these adaptations is critical to intervening successfully against malaria. To that end, we created a high-density genotyping array that assays over 17,000 single nucleotide polymorphisms (~1 SNP/kb), and applied it to 57 culture-adapted parasites from three continents. We characterized genome-wide genetic diversity within and between populations and identified numerous loci with signals of natural selection, suggesting their role in recent adaptation. In addition, we performed a genome-wide association study (GWAS), searching for loci correlated with resistance to thirteen antimalarials; we detected both known and novel resistance loci, including a new halofantrine resistance locus, PF10_0355. Through functional testing we demonstrated that PF10_0355 overexpression decreases sensitivity to halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated antimalarials, and that increased gene copy number mediates resistance. Our GWAS and follow-on functional validation demonstrate the potential of genome-wide studies to elucidate functionally important loci in the malaria parasite genome.Bill & Melinda Gates FoundationEllison Medical FoundationExxon Mobil FoundationFogarty International CenterNational Institute of Allergy and Infectious Diseases (U.S.)Burroughs Wellcome FundDavid & Lucile Packard FoundationNational Science Foundation (U.S.). Graduate Research Fellowship Progra

    Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts

    Get PDF
    High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature

    Gastroesophageal reflux GWAS identifies risk loci that also associate with subsequent severe esophageal diseases

    Get PDF
    Funder: The Swedish Esophageal Cancer Study was funded by grants (R01 CA57947-03) from the National Cancer Institute he California Tobacco Related Research Program (3RT-0122; and; 10RT-0251) Marit Peterson Fund for Melanoma Research. CIDR is supported by contract HHSN268200782096CAbstract: Gastroesophageal reflux disease (GERD) is caused by gastric acid entering the esophagus. GERD has high prevalence and is the major risk factor for Barrett’s esophagus (BE) and esophageal adenocarcinoma (EA). We conduct a large GERD GWAS meta-analysis (80,265 cases, 305,011 controls), identifying 25 independent genome-wide significant loci for GERD. Several of the implicated genes are existing or putative drug targets. Loci discovery is greatest with a broad GERD definition (including cases defined by self-report or medication data). Further, 91% of the GERD risk-increasing alleles also increase BE and/or EA risk, greatly expanding gene discovery for these traits. Our results map genes for GERD and related traits and uncover potential new drug targets for these conditions

    Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

    Get PDF
    Publisher Copyright: © 2022, The Author(s).We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12–16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI’s magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.Peer reviewe

    Migraine, inflammatory bowel disease and celiac disease:A Mendelian randomization study

    Get PDF
    Objective: To assess whether migraine may be genetically and/or causally associated with inflammatory bowel disease (IBD) or celiac disease. Background: Migraine has been linked to IBD and celiac disease in observational studies, but whether this link may be explained by a shared genetic basis or could be causal has not been established. The presence of a causal association could be clinically relevant, as treating one of these medical conditions might mitigate the symptoms of a causally linked condition. Methods:Linkage disequilibrium score regression and two-sample bidirectional Mendelian randomization analyses were performed using summary statistics from cohort-based genome-wide association studies of migraine (59,674 cases; 316,078 controls), IBD (25,042 cases; 34,915 controls) and celiac disease (11,812 or 4533 cases; 11,837 or 10,750 controls). Migraine with and without aura were analyzed separately, as were the two IBD subtypes Crohn's disease and ulcerative colitis. Positive control analyses and conventional Mendelian randomization sensitivity analyses were performed.Results: Migraine was not genetically correlated with IBD or celiac disease. No evidence was observed for IBD (odds ratio [OR] 1.00, 95% confidence interval [CI] 0.99–1.02, p = 0.703) or celiac disease (OR 1.00, 95% CI 0.99–1.02, p = 0.912) causing migraine or migraine causing either IBD (OR 1.08, 95% CI 0.96–1.22, p = 0.181) or celiac disease (OR 1.08, 95% CI 0.79–1.48, p = 0.614) when all participants with migraine were analyzed jointly. There was some indication of a causal association between celiac disease and migraine with aura (OR 1.04, 95% CI 1.00–1.08, p = 0.045), between celiac disease and migraine without aura (OR 0.95, 95% CI 0.92–0.99, p = 0.006), as well as between migraine without aura and ulcerative colitis (OR 1.15, 95% CI 1.02–1.29, p = 0.025). However, the results were not significant after multiple testing correction. Conclusions: We found no evidence of a shared genetic basis or of a causal association between migraine and either IBD or celiac disease, although we obtained some indications of causal associations with migraine subtypes.</p
    • …
    corecore