20 research outputs found

    SAIGE-GENE plus improves the efficiency and accuracy of set-based rare variant association tests

    Get PDF
    Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.Peer reviewe

    S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing

    Get PDF
    Exome analysis of patients with a likely monogenic disease does not identify a causal variant in over half of cases. Splice-disrupting mutations make up the second largest class of known disease-causing mutations. Each individual (singleton) exome harbors over 500 rare variants of unknown significance (VUS) in the splicing region. The existing relevant pathogenicity prediction tools tackle all non-coding variants as one amorphic class and/or are not calibrated for the high sensitivity required for clinical use. Here we calibrate seven such tools and devise a novel tool called Splicing Clinically Applicable Pathogenicity prediction (S-CAP) that is over twice as powerful as all previous tools, removing 41% of patient VUS at 95% sensitivity. We show that S-CAP does this by using its own features and not via meta-prediction over previous tools, and that splicing pathogenicity prediction is distinct from predicting molecular splicing changes. S-CAP is an important step on the path to deriving non-coding causal diagnoses

    AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

    Get PDF
    The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu

    AVADA: toward automated pathogenic variant evidence retrieval directly from the full-text literature

    Get PDF
    Purpose: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. Methods: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. Results AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar’s 21, versus only 2 using the best current automated approach. Conclusion : AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis

    Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis

    Get PDF
    Genome-wide association studies (GWAS) have revealed risk alleles for ulcerative colitis (UC). To understand their cell type specificities and pathways of action, we generate an atlas of 366,650 cells from the colon mucosa of 18 UC patients and 12 healthy individuals, revealing 51 epithelial, stromal, and immune cell subsets, including BEST4(+) enterocytes, microfold-like cells, and IL13RA2(+)IL11(+) inflammatory fibroblasts, which we associate with resistance to anti-TNF treatment. Inflammatory fibroblasts, inflammatory monocytes, microfold-like cells, and T cells that co-express CD8 and IL-17 expand with disease, forming intercellular interaction hubs. Many UC risk genes are cell type specific and coregulated within relatively few gene modules, suggesting convergence onto limited sets of cell types and pathways. Using this observation, we nominate and infer functions for specific risk genes across GWAS loci. Our work provides a framework for interrogating complex human diseases and mapping risk variants to cell types and pathways.Peer reviewe

    Single-cell meta-analysis of SARS-CoV-2 entry genes across tissues and demographics

    Get PDF
    Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial-macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention

    DataSheet_1_Treatment-associated remodeling of the pancreatic cancer endothelium at single-cell resolution.pdf

    No full text
    Pancreatic ductal adenocarcinoma (PDAC) is one of the most treatment refractory and lethal malignancies. The diversity of endothelial cell (EC) lineages in the tumor microenvironment (TME) impacts the efficacy of antineoplastic therapies, which in turn remodel EC states and distributions. Here, we present a single-cell resolution framework of diverse EC lineages in the PDAC TME in the context of neoadjuvant chemotherapy, radiotherapy, and losartan. We analyzed a custom single-nucleus RNA-seq dataset derived from 37 primary PDAC specimens (18 untreated, 14 neoadjuvant FOLFIRINOX + chemoradiotherapy, 5 neoadjuvant FOLFIRINOX + chemoradiotherapy + losartan). A single-nucleus transcriptome analysis of 15,185 EC profiles revealed two state programs (ribosomal, cycling), four lineage programs (capillary, arterial, venous, lymphatic), and one program that did not overlap significantly with prior signatures but was enriched in pathways involved in vasculogenesis, stem-like state, response to wounding and hypoxia, and endothelial-to-mesenchymal transition (reactive EndMT). A bulk transcriptome analysis of two independent cohorts (n = 269 patients) revealed that the lymphatic and reactive EndMT lineage programs were significantly associated with poor clinical outcomes. While losartan and proton therapy were associated with reduced lymphatic ECs, these therapies also correlated with an increase in reactive EndMT. Thus, the development and inclusion of EndMT-inhibiting drugs (e.g., nintedanib) to a neoadjuvant chemoradiotherapy regimen featuring losartan and/or proton therapy may be most effective in depleting both lymphatic and reactive EndMT populations and potentially improving patient outcomes.</p

    DataSheet_2_Treatment-associated remodeling of the pancreatic cancer endothelium at single-cell resolution.pdf

    No full text
    Pancreatic ductal adenocarcinoma (PDAC) is one of the most treatment refractory and lethal malignancies. The diversity of endothelial cell (EC) lineages in the tumor microenvironment (TME) impacts the efficacy of antineoplastic therapies, which in turn remodel EC states and distributions. Here, we present a single-cell resolution framework of diverse EC lineages in the PDAC TME in the context of neoadjuvant chemotherapy, radiotherapy, and losartan. We analyzed a custom single-nucleus RNA-seq dataset derived from 37 primary PDAC specimens (18 untreated, 14 neoadjuvant FOLFIRINOX + chemoradiotherapy, 5 neoadjuvant FOLFIRINOX + chemoradiotherapy + losartan). A single-nucleus transcriptome analysis of 15,185 EC profiles revealed two state programs (ribosomal, cycling), four lineage programs (capillary, arterial, venous, lymphatic), and one program that did not overlap significantly with prior signatures but was enriched in pathways involved in vasculogenesis, stem-like state, response to wounding and hypoxia, and endothelial-to-mesenchymal transition (reactive EndMT). A bulk transcriptome analysis of two independent cohorts (n = 269 patients) revealed that the lymphatic and reactive EndMT lineage programs were significantly associated with poor clinical outcomes. While losartan and proton therapy were associated with reduced lymphatic ECs, these therapies also correlated with an increase in reactive EndMT. Thus, the development and inclusion of EndMT-inhibiting drugs (e.g., nintedanib) to a neoadjuvant chemoradiotherapy regimen featuring losartan and/or proton therapy may be most effective in depleting both lymphatic and reactive EndMT populations and potentially improving patient outcomes.</p

    Complex macroscale structures formed by the shock processing of amino acids and nucleobases -- implications to the origins of life

    No full text
    The building blocks of life, amino acids and nucleobases, are believed to have been synthesized in the extreme conditions that prevail in space starting from simple molecules containing hydrogen, carbon, oxygen and nitrogen. However, the fate and role of amino acids and nucleobases when they are subjected to similar processes largely remains unexplored. Here we report, for the first time, that shock processed amino acids and nucleobases tend to form complex macroscale structures. Such structures are formed on timescales of about 2 ms. This discovery suggests that the building blocks of life could have polymerized not just on Earth but on other planetary bodies. Our study also provides further experimental evidence for the 'threads' observed in meteorites being due to assemblages of (bio)molecules arising from impact induced shocks.by Vijay Thiruvenkatam et al

    Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

    No full text
    Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency
    corecore