31 research outputs found
Harnessing large language models (LLMs) for candidate gene prioritization and selection.
BACKGROUND: Feature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.
METHODS: In this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene\u27s biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.
RESULTS: Of the four LLMs evaluated, OpenAI\u27s GPT-4 and Anthropic\u27s Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.
CONCLUSIONS: Taken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge
A data browsing application for accessing gene and module-level blood transcriptome profiles of healthy pregnant women from high- and low-resource settings
Transcriptome profiling data, generated via RNA sequencing, are commonly deposited in public repositories. However, these data may not be easily accessible or usable by many researchers. To enhance data reuse, we present well-annotated, partially analyzed data via a user-friendly web application. This project involved transcriptome profiling of blood samples from 15 healthy pregnant women in a low-resource setting, taken at 6 consecutive time points beginning from the first trimester. Additional blood transcriptome profiles were retrieved from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) public repository, representing a cohort of healthy pregnant women from a high-resource setting. We analyzed these datasets using the fixed BloodGen3 module repertoire. We deployed a web application, accessible at https://thejacksonlaboratory.shinyapps.io/BloodGen3_Pregnancy/which displays the module-level analysis results from both original and public pregnancy blood transcriptome datasets. Users can create custom fingerprint grid and heatmap representations via various navigation options, useful for reports and manuscript preparation. The web application serves as a standalone resource for exploring blood transcript abundance changes during pregnancy. Alternatively, users can integrate it with similar applications developed for earlier publications to analyze transcript abundance changes of a given BloodGen3 signature across a range of disease cohorts. Database URL: https://thejacksonlaboratory.shinyapps.io/BloodGen3_Pregnancy
The molecular landscape of sepsis severity in infants: enhanced coagulation, innate immunity, and T cell repression
IntroductionSepsis remains a major cause of mortality and morbidity in infants. In recent years, several gene marker strategies for the early identification of sepsis have been proposed but only a few have been independently validated for adult cohorts and applicability to infant sepsis remains unclear. Biomarkers to assess disease severity and risks of shock also represent an important unmet need.MethodsTo elucidate characteristics driving sepsis in infants, we assembled a multi-transcriptomic dataset from public microarray datasets originating from five independent studies pertaining to bacterial sepsis in infant < 6-months of age (total n=335). We utilized a COmbat co-normalization strategy to enable comparative evaluation across multiple studies while preserving the relationship between cases and controls.ResultsWe found good concordance with only two out of seven of the published adult sepsis gene signatures (accuracy > 80%), highlighting the narrow utility of adult-derived signatures for infant diagnosis. Pseudotime analysis of individual subjects’ gene expression profiles showed a continuum of molecular changes forming tight clusters concurrent with disease progression between healthy controls and septic shock cases. In depth gene expression analyses between bacteremia, septic shock, and healthy controls characterized lymphocyte activity, hemostatic processes, and heightened innate immunity during the molecular transition toward a state of shock.DiscussionOur analysis revealed the presence of multiple significant transcriptomic perturbations that occur during the progression to septic shock in infants that are characterized by late-stage induction of clotting factors, in parallel with a heightened innate immune response and a suppression of adaptive cell functionality
Organizing gene literature retrieval, profiling, and visualization training workshops for early career researchers
Developing the skills needed to effectively search and extract information from biomedical literature is essential for early-career researchers. It is, for instance, on this basis that the novelty of experimental results, and therefore publishing opportunities, can be evaluated. Given the unprecedented volume of publications in the field of biomedical research, new systematic approaches need to be devised and adopted for the retrieval and curation of literature relevant to a specific theme. Here we describe a hands-on training curriculum aimed at retrieval, profiling, and visualization of literature associated with a given topic. This curriculum was implemented in a workshop in January 2021. We provide supporting material and step-by-step implementation guidelines with the ISG15 gene literature serving as an illustrative use case. Through participation in such a workshop, trainees can learn: 1) to build and troubleshoot PubMed queries in order to retrieve the literature associated with a gene of interest; 2) to identify key concepts relevant to given themes (such as cell types, diseases, and biological processes); 3) to measure the prevalence of these concepts in the gene literature; 4) to extract key information from relevant articles, and 5) to develop a background section or summary on the basis of this information. Finally, trainees can learn to consolidate the structured information captured through this process for presentation via an interactive web application
A modular framework for the development of targeted Covid-19 blood transcript profiling panels
Covid-19 morbidity and mortality are associated with a dysregulated immune response. Tools are needed to enhance existing immune profiling capabilities in affected patients. Here we aimed to develop an approach to support the design of targeted blood transcriptome panels for profiling the immune response to SARS-CoV-2 infection.; We designed a pool of candidates based on a pre-existing and well-characterized repertoire of blood transcriptional modules. Available Covid-19 blood transcriptome data was also used to guide this process. Further selection steps relied on expert curation. Additionally, we developed several custom web applications to support the evaluation of candidates.; As a proof of principle, we designed three targeted blood transcript panels, each with a different translational connotation: immunological relevance, therapeutic development relevance and SARS biology relevance.; Altogether the work presented here may contribute to the future expansion of immune profiling capabilities via targeted profiling of blood transcript abundance in Covid-19 patients
Immunomodulatory effects of vitamin d supplementation in a deficient population
In addition to its canonical functions, vitamin D has been proposed to be an important mediator of the immune system. Despite ample sunshine, vitamin D deficiency is prevalent (>80%) in the Middle East, resulting in a high rate of supplementation. However, the underlying molecular mechanisms of the specific regimen prescribed and the potential factors affecting an individual’s response to vitamin D supplementation are not well characterized. Our objective is to describe the changes in the blood transcriptome and explore the potential mechanisms associated with vitamin D3 supplementation in one hundred vitamin D-deficient women who were given a weekly oral dose (50,000 IU) of vitamin D3 for three months. A high-throughput targeted PCR, composed of 264 genes representing the important blood transcriptomic fingerprints of health and disease states, was performed on pre and post-supplementation blood samples to profile the molecular response to vitamin D3. We identified 54 differentially expressed genes that were strongly modulated by vitamin D3 supplementation. Network analyses showed significant changes in the immune-related pathways such as TLR4/CD14 and IFN receptors, and catabolic processes related to NF-kB, which were subsequently confirmed by gene ontology enrichment analyses. We proposed a model for vitamin D3 response based on the expression changes of molecules involved in the receptor-mediated intra-cellular signaling pathways and the ensuing predicted effects on cytokine production. Overall, vitamin D3 has a strong effect on the immune system, G-coupled protein receptor signaling, and the ubiquitin system. We highlighted the major molecular changes and biological processes induced by vitamin D3, which will help to further investigate the effectiveness of vitamin D3 supplementation among individuals in the Middle East as well as other regions.Funding: This work was supported by National Capacity Building Program grant from Qatar University (ID# QUCP-CHS-17\18-1)
Application of a gene modular approach for clinical phenotype genotype association and sepsis prediction using machine learning in meningococcal sepsis
Sepsis is a major global health concern causing high morbidity and mortality rates. Our study utilized a Meningococcal Septic Shock (MSS) temporal dataset to investigate the correlation between gene expression (GE) changes and clinical features. The research used Weighted Gene Co-expression Network Analysis (WGCNA) to establish links between gene expression and clinical parameters in infants admitted to the Pediatric Critical Care Unit with MSS. Additionally, various machine learning (ML) algorithms, including Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), Decision Tree, Random Forest, and Artificial Neural Network (ANN) were implemented to predict sepsis survival. The findings revealed a transition in gene function pathways from nuclear to cytoplasmic to extracellular, corresponding with Pediatric Logistic Organ Dysfunction score (PELOD) readings at 0, 24, and 48 h. ANN was the most accurate of the six ML models applied for survival prediction. This study successfully correlated PELOD with transcriptomic data, mapping enriched GE modules in acute sepsis. By integrating network analysis methods to identify key gene modules and using machine learning for sepsis prognosis, this study offers valuable insights for precision-based treatment strategies in future research. The observed temporal-spatial pattern of cellular recovery in sepsis could prove useful in guiding clinical management and therapeutic interventions
Development of a fixed module repertoire for the analysis and interpretation of blood transcriptome data.
As the capacity for generating large-scale molecular profiling data continues to grow, the ability to extract meaningful biological knowledge from it remains a limitation. Here, we describe the development of a new fixed repertoire of transcriptional modules, BloodGen3, that is designed to serve as a stable reusable framework for the analysis and interpretation of blood transcriptome data. The construction of this repertoire is based on co-clustering patterns observed across sixteen immunological and physiological states encompassing 985 blood transcriptome profiles. Interpretation is supported by customized resources, including module-level analysis workflows, fingerprint grid plot visualizations, interactive web applications and an extensive annotation framework comprising functional profiling reports and reference transcriptional profiles. Taken together, this well-characterized and well-supported transcriptional module repertoire can be employed for the interpretation and benchmarking of blood transcriptome profiles within and across patient cohorts. Blood transcriptome fingerprints for the 16 reference cohorts can be accessed interactively via: https://drinchai.shinyapps.io/BloodGen3Module/
Abundance of ACVR1B transcript is elevated during septic conditions: Perspectives obtained from a hands-on reductionist investigation
Sepsis is a complex heterogeneous condition, and the current lack of effective risk and outcome predictors hinders the improvement of its management. Using a reductionist approach leveraging publicly available transcriptomic data, we describe a knowledge gap for the role of ACVR1B (activin A receptor type 1B) in sepsis. ACVR1B, a member of the transforming growth factor-beta (TGF-beta) superfamily, was selected based on the following: 1) induction upon in vitro exposure of neutrophils from healthy subjects with the serum of septic patients (GSE49755), and 2) absence or minimal overlap between ACVR1B, sepsis, inflammation, or neutrophil in published literature. Moreover, ACVR1B expression is upregulated in septic melioidosis, a widespread cause of fatal sepsis in the tropics. Key biological concepts extracted from a series of PubMed queries established indirect links between ACVR1B and “cancer”, “TGF-beta superfamily”, “cell proliferation”, “inhibitors of activin”, and “apoptosis”. We confirmed our observations by measuring ACVR1B transcript abundance in buffy coat samples obtained from healthy individuals (n=3) exposed to septic plasma (n = 26 melioidosis sepsis cases)ex vivo. Based on our re-investigation of publicly available transcriptomic data and newly generated ex vivo data, we provide perspective on the role of ACVR1B during sepsis. Additional experiments for addressing this knowledge gap are discussed
A Transcriptomic Appreciation of Childhood Meningococcal and Polymicrobial Sepsis from a Pro-Inflammatory and Trajectorial Perspective, a Role for Vascular Endothelial Growth Factor A and B Modulation?
This study investigated the temporal dynamics of childhood sepsis by analyzing gene expression changes associated with proinflammatory processes. Five datasets, including four meningococcal sepsis shock (MSS) datasets (two temporal and two longitudinal) and one polymicrobial sepsis dataset, were selected to track temporal changes in gene expression. Hierarchical clustering revealed three temporal phases: early, intermediate, and late, providing a framework for understanding sepsis progression. Principal component analysis supported the identification of gene expression trajectories. Differential gene analysis highlighted consistent upregulation of vascular endothelial growth factor A (VEGF-A) and nuclear factor κB1 (NFKB1), genes involved in inflammation, across the sepsis datasets. NFKB1 gene expression also showed temporal changes in the MSS datasets. In the postmortem dataset comparing MSS cases to controls, VEGF-A was upregulated and VEGF-B downregulated. Renal tissue exhibited higher VEGF-A expression compared with other tissues. Similar VEGF-A upregulation and VEGF-B downregulation patterns were observed in the cross-sectional MSS datasets and the polymicrobial sepsis dataset. Hexagonal plots confirmed VEGF-R (VEGF receptor)–VEGF-R2 signaling pathway enrichment in the MSS cross-sectional studies. The polymicrobial sepsis dataset also showed enrichment of the VEGF pathway in septic shock day 3 and sepsis day 3 samples compared with controls. These findings provide unique insights into the dynamic nature of sepsis from a transcriptomic perspective and suggest potential implications for biomarker development. Future research should focus on larger-scale temporal transcriptomic studies with appropriate control groups and validate the identified gene combination as a potential biomarker panel for sepsis