134 research outputs found
Multimodal LLMs for health grounded in individual-specific data
Foundation large language models (LLMs) have shown an impressive ability to
solve tasks across a wide range of fields including health. To effectively
solve personalized health tasks, LLMs need the ability to ingest a diversity of
data modalities that are relevant to an individual's health status. In this
paper, we take a step towards creating multimodal LLMs for health that are
grounded in individual-specific data by developing a framework (HeLM: Health
Large Language Model for Multimodal Understanding) that enables LLMs to use
high-dimensional clinical modalities to estimate underlying disease risk. HeLM
encodes complex data modalities by learning an encoder that maps them into the
LLM's token embedding space and for simple modalities like tabular data by
serializing the data into text. Using data from the UK Biobank, we show that
HeLM can effectively use demographic and clinical features in addition to
high-dimensional time-series data to estimate disease risk. For example, HeLM
achieves an AUROC of 0.75 for asthma prediction when combining tabular and
spirogram data modalities compared with 0.49 when only using tabular data.
Overall, we find that HeLM outperforms or performs at parity with classical
machine learning approaches across a selection of eight binary traits.
Furthermore, we investigate the downstream uses of this model such as its
generalizability to out-of-distribution traits and its ability to power
conversations around individual health and wellness
Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice
Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study
R/qtl2: Software for Mapping Quantitative Trait Loci with High-Dimensional Data and Multiparent Populations.
R/qtl2 is an interactive software environment for mapping quantitative trait loci (QTL) in experimental populations. The R/qtl2 software expands the scope of the widely used R/qtl software package to include multiparent populations derived from more than two founder strains, such as the Collaborative Cross and Diversity Outbred mice, heterogeneous stocks, and MAGIC plant populations. R/qtl2 is designed to handle modern high-density genotyping data and high-dimensional molecular phenotypes, including gene expression and proteomics. R/qtl2 includes the ability to perform genome scans using a linear mixed model to account for population structure, and also includes features to impute SNPs based on founder strain genomes and to carry out association mapping. The R/qtl2 software provides all of the basic features needed for QTL mapping, including graphical displays and summary reports, and it can be extended through the creation of add-on packages. R/qtl2, which is free and open source software written in the R and C++ programming languages, comes with a test framework
Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature
Identification and Functional Validation of the Novel Antimalarial Resistance Locus PF10_0355 in Plasmodium falciparum
The Plasmodium falciparum parasite's ability to adapt to environmental pressures, such as the human immune system and antimalarial drugs, makes malaria an enduring burden to public health. Understanding the genetic basis of these adaptations is critical to intervening successfully against malaria. To that end, we created a high-density genotyping array that assays over 17,000 single nucleotide polymorphisms (~1 SNP/kb), and applied it to 57 culture-adapted parasites from three continents. We characterized genome-wide genetic diversity within and between populations and identified numerous loci with signals of natural selection, suggesting their role in recent adaptation. In addition, we performed a genome-wide association study (GWAS), searching for loci correlated with resistance to thirteen antimalarials; we detected both known and novel resistance loci, including a new halofantrine resistance locus, PF10_0355. Through functional testing we demonstrated that PF10_0355 overexpression decreases sensitivity to halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated antimalarials, and that increased gene copy number mediates resistance. Our GWAS and follow-on functional validation demonstrate the potential of genome-wide studies to elucidate functionally important loci in the malaria parasite genome.Bill & Melinda Gates FoundationEllison Medical FoundationExxon Mobil FoundationFogarty International CenterNational Institute of Allergy and Infectious Diseases (U.S.)Burroughs Wellcome FundDavid & Lucile Packard FoundationNational Science Foundation (U.S.). Graduate Research Fellowship Progra
Migraine, inflammatory bowel disease and celiac disease:A Mendelian randomization study
Objective: To assess whether migraine may be genetically and/or causally associated with inflammatory bowel disease (IBD) or celiac disease. Background: Migraine has been linked to IBD and celiac disease in observational studies, but whether this link may be explained by a shared genetic basis or could be causal has not been established. The presence of a causal association could be clinically relevant, as treating one of these medical conditions might mitigate the symptoms of a causally linked condition. Methods:Linkage disequilibrium score regression and two-sample bidirectional Mendelian randomization analyses were performed using summary statistics from cohort-based genome-wide association studies of migraine (59,674 cases; 316,078 controls), IBD (25,042 cases; 34,915 controls) and celiac disease (11,812 or 4533 cases; 11,837 or 10,750 controls). Migraine with and without aura were analyzed separately, as were the two IBD subtypes Crohn's disease and ulcerative colitis. Positive control analyses and conventional Mendelian randomization sensitivity analyses were performed.Results: Migraine was not genetically correlated with IBD or celiac disease. No evidence was observed for IBD (odds ratio [OR] 1.00, 95% confidence interval [CI] 0.99–1.02, p = 0.703) or celiac disease (OR 1.00, 95% CI 0.99–1.02, p = 0.912) causing migraine or migraine causing either IBD (OR 1.08, 95% CI 0.96–1.22, p = 0.181) or celiac disease (OR 1.08, 95% CI 0.79–1.48, p = 0.614) when all participants with migraine were analyzed jointly. There was some indication of a causal association between celiac disease and migraine with aura (OR 1.04, 95% CI 1.00–1.08, p = 0.045), between celiac disease and migraine without aura (OR 0.95, 95% CI 0.92–0.99, p = 0.006), as well as between migraine without aura and ulcerative colitis (OR 1.15, 95% CI 1.02–1.29, p = 0.025). However, the results were not significant after multiple testing correction. Conclusions: We found no evidence of a shared genetic basis or of a causal association between migraine and either IBD or celiac disease, although we obtained some indications of causal associations with migraine subtypes.</p
Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals
Publisher Copyright: © 2022, The Author(s).We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12–16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI’s magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.Peer reviewe
Gastroesophageal reflux GWAS identifies risk loci that also associate with subsequent severe esophageal diseases
Funder: The Swedish Esophageal Cancer Study was funded by grants (R01 CA57947-03) from the National Cancer Institute he California Tobacco Related Research Program (3RT-0122; and; 10RT-0251) Marit Peterson Fund for Melanoma Research. CIDR is supported by contract HHSN268200782096CAbstract: Gastroesophageal reflux disease (GERD) is caused by gastric acid entering the esophagus. GERD has high prevalence and is the major risk factor for Barrett’s esophagus (BE) and esophageal adenocarcinoma (EA). We conduct a large GERD GWAS meta-analysis (80,265 cases, 305,011 controls), identifying 25 independent genome-wide significant loci for GERD. Several of the implicated genes are existing or putative drug targets. Loci discovery is greatest with a broad GERD definition (including cases defined by self-report or medication data). Further, 91% of the GERD risk-increasing alleles also increase BE and/or EA risk, greatly expanding gene discovery for these traits. Our results map genes for GERD and related traits and uncover potential new drug targets for these conditions
Recommended from our members
Genome-wide association and epidemiological analyses reveal common genetic origins between uterine leiomyomata and endometriosis
Abstract: Uterine leiomyomata (UL) are the most common neoplasms of the female reproductive tract and primary cause for hysterectomy, leading to considerable morbidity and high economic burden. Here we conduct a GWAS meta-analysis in 35,474 cases and 267,505 female controls of European ancestry, identifying eight novel genome-wide significant (P < 5 × 10−8) loci, in addition to confirming 21 previously reported loci, including multiple independent signals at 10 loci. Phenotypic stratification of UL by heavy menstrual bleeding in 3409 cases and 199,171 female controls reveals genome-wide significant associations at three of the 29 UL loci: 5p15.33 (TERT), 5q35.2 (FGFR4) and 11q22.3 (ATM). Four loci identified in the meta-analysis are also associated with endometriosis risk; an epidemiological meta-analysis across 402,868 women suggests at least a doubling of risk for UL diagnosis among those with a history of endometriosis. These findings increase our understanding of genetic contribution and biology underlying UL development, and suggest overlapping genetic origins with endometriosis
- …