Search CORE

182 research outputs found

Optimal Amnesic Probabilistic Automata or How to Learn and Classify Proteins in Linear Time and Space

Author: Alberto Apostolico
Apostolico A.
Forchhammet S.
Gill Bejerano
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

X-CAP improves pathogenicity prediction of stopgain variants

Author: Bejerano Gill
Cooper David N.
Rastogi Ruchir
Stenson Peter D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/07/2022
Field of study

Abstract: Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP

Online Research @ Cardiff

PubMed Central

Morphogenesis is transcriptionally coupled to neurogenesis during olfactory placode development

Author: Aguillon Raphaël
Batut Julie
Bejerano Gill
Blader Patrick
Dufourcq Pascale
Guturu Harendra
Lecaudey Virginie
Link Sandra
Madelaine Romain
Publication venue: HAL CCSD
Publication date
Field of study

Recommended from our members

Evolutionary biology for the 21st century

Author: Arnold Stevan J.
Bejerano Gill
Brodie E. D.
Hibbett David
Hoekstra Hopi E.
Losos Jonathan B.
Mindell David P.
Monteiro Antónia
Moritz Craig
Orr H. Allen
Petrov Dmitri A.
Renner Susanne S.
Ricklefs Robert E.
Soltis Pamela S.
Turner Thomas L.
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2013
Field of study

New theoretical and conceptual frameworks are required for evolutionary biology to capitalize on the wealth of data now becoming available from the study of genomes, phenotypes, and organisms - including humans - in their natural environments.Molecular and Cellular BiologyOrganismic and Evolutionary Biolog

Harvard University - DASH

Directory of Open Access Journals

Open Access LMU

PubMed Central

University of Missouri, St. Louis

The Australian National University

FigShare

Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology

Author: Aguirre Matthew
Bejerano Gill
Chang Chris
DeBoever Christopher
Hastie Trevor
Horn Heiko
Ingelsson Erik
Justesen Johanne M.
Lage Kasper
Li Jiehan
Narasimhan Balasubramanian
Park Chong Y
Rivas Manuel A
Tanigawa Yosuke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

AVADA improves automated genetic variant database construction directly from full-text literature

Author: Bejerano Gill
Bernstein Jonathan
Birgmeier Johannes
Cooper David
Deisseroth Cole
Haeussler Maximilian
Jagadeesh Karthik
Stenson Peter
Tierno Andrew
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 04/11/2018
Field of study

Purpose: The primary literature on human genetic diseases includes descriptions of pathogenic variants that are essential for clinical diagnosis. Variant databases such as ClinVar and HGMD collect pathogenic variants by manual curation. We aimed to automatically construct a freely accessible database of pathogenic variants directly from full-text articles about genetic disease. Methods: AVADA (Automatically curated VAriant DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic variants and genes in full text of primary literature and converts them to genomic coordinates for rapid downstream use. Results: AVADA automatically curated almost 60% of pathogenic variants deposited in HGMD, a 4.4-fold improvement over the current state of the art in automated variant extraction. AVADA also contains more than 60,000 pathogenic variants that are in HGMD, but not in ClinVar. In a cohort of 245 diagnosed patients, AVADA correctly annotated 38 previously described diagnostic variants, compared to 43 using HGMD, 20 using ClinVar and only 13 (wholly subsumed by AVADA and ClinVar's) using the best automated abstracts-only based approach. Conclusion: AVADA is the first machine learning tool that automatically curates a variants database directly from full text literature. AVADA is available upon publication at http://bejerano.stanford.edu/AVADA

Online Research @ Cardiff

S-CAP extends pathogenicity prediction to genetic variants that affect RNA splicing

Author: Bejerano Gill
Bernstein Jonathan A.
Cooper David N.
Jagadeesh Karthik A.
Paggi Joseph M.
Stenson Peter D.
Ye James S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2019
Field of study

Exome analysis of patients with a likely monogenic disease does not identify a causal variant in over half of cases. Splice-disrupting mutations make up the second largest class of known disease-causing mutations. Each individual (singleton) exome harbors over 500 rare variants of unknown significance (VUS) in the splicing region. The existing relevant pathogenicity prediction tools tackle all non-coding variants as one amorphic class and/or are not calibrated for the high sensitivity required for clinical use. Here we calibrate seven such tools and devise a novel tool called Splicing Clinically Applicable Pathogenicity prediction (S-CAP) that is over twice as powerful as all previous tools, removing 41% of patient VUS at 95% sensitivity. We show that S-CAP does this by using its own features and not via meta-prediction over previous tools, and that splicing pathogenicity prediction is distinct from predicting molecular splicing changes. S-CAP is an important step on the path to deriving non-coding causal diagnoses

Crossref

Online Research @ Cardiff

Recommended from our members

Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts.

Author: Ashley Euan A
Balliu Brunilda
Battle Alexis
Bejerano Gill
Bernstein Jonathan A
Bonner Devon
Boycott Kym M
Care4Rare Canada Consortium
Davidson Jean M
Davis Joe R
Ferraro Nicole M
Fisk Dianna G
Frésard Laure
Grove Megan E
Hartley Taila
Ingelsson Erik
Joshi Ruchi
Kernohan Kristin D
Kohler Jennefer N
Li Xin
Lind Lars
Liu Boxiang
Marwaha Shruti
Merker Jason D
Montgomery Stephen B
Prybol Cameron J
Reuter Chloe M
Smail Craig
Smith Kevin S
Strober Benjamin J
Teran Nicole A
Undiagnosed Diseases Network
Utiramerur Sowmithri
Wheeler Matthew T
Zappala Zachary
Zastrow Diane B
Publication venue: eScholarship, University of California
Publication date: 01/06/2019
Field of study

It is estimated that 350 million individuals worldwide suffer from rare diseases, which are predominantly caused by mutation in a single gene1. The current molecular diagnostic rate is estimated at 50%, with whole-exome sequencing (WES) among the most successful approaches2-5. For patients in whom WES is uninformative, RNA sequencing (RNA-seq) has shown diagnostic utility in specific tissues and diseases6-8. This includes muscle biopsies from patients with undiagnosed rare muscle disorders6,9, and cultured fibroblasts from patients with mitochondrial disorders7. However, for many individuals, biopsies are not performed for clinical care, and tissues are difficult to access. We sought to assess the utility of RNA-seq from blood as a diagnostic tool for rare diseases of different pathophysiologies. We generated whole-blood RNA-seq from 94 individuals with undiagnosed rare diseases spanning 16 diverse disease categories. We developed a robust approach to compare data from these individuals with large sets of RNA-seq data for controls (n = 1,594 unrelated controls and n = 49 family members) and demonstrated the impacts of expression, splicing, gene and variant filtering strategies on disease gene identification. Across our cohort, we observed that RNA-seq yields a 7.5% diagnostic rate, and an additional 16.7% with improved candidate gene resolution

eScholarship - University of California

AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

Author: Beggs Alan H.
Bejerano Gill
Bernstein Jonathan A.
Birgmeier Johannes
Cooper David N.
Deisseroth Cole A.
Diekhans Mark E.
Guturu Harendra
Haeussler Maximilian
Jagadeesh Karthik A.
Ratner Alexander J.
Ré Christopher
Steinberg Ethan H.
Stenson Peter D.
Wenger Aaron M.
Publication venue: 'American Association for the Advancement of Science (AAAS)'
Publication date: 20/05/2020
Field of study

The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu

Online Research @ Cardiff

PubMed Central

eScholarship - University of California