Search CORE

12,719 research outputs found

Sequential data selection for predicting the pathogenic effects of sequence variation

Author: Campbell Colin
Cooper David N.
Gaunt Tom R.
Mort Matthew
Rogers Mark F.
Shihab Hashem A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2015
Field of study

Crossref

Explore Bristol Research

Recommended from our members

The Expanding Landscape of Alternative Splicing Variation in Human Populations.

Author: Lin Lan
Pan Zhicheng
Park Eddie
Xing Yi
Zhang Zijun
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine

eScholarship - University of California

Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants

Author: 1000 Genomes Project Consortium
Adzhubei
Alipanahi
Altschul
Bentwich
Bermejo-Das-Neves
Brest
Buratti
Buske
Capra
Carlini
Chamary
Cortes
Cáceres
Davydov
Deaton
DeLong
Douville
Dreyfuss
Duan
Fairbrother
Folkman
Folkman
Gartner
Grimm
Hajdin
Heffernan
Hershberg
Ho
Hu
Hunt
Hurst
Karolchik
Kimchi-Sarfaty
Kircher
Kirchner
Kudla
Li
Lorenz
Macaya
Markham
Meinshausen
Miao
Miller
Montera
Mort
Mortimer
Neale
Niroula
Parmley
Plotkin
Pollard
Pruitt
Remmert
Rhodes
Rudolph
Samocha
Sauna
Savisaar
Schwarz
Seetin
Shabalina
Sharp
Shihab
Simone
Smith
Stark
Stenson
Stergachis
Supek
Todd
UniProt Consortium
Vihinen
Wan
Wang
Whitney
Wu
Xiong
Yang
Yeo
Zhang
Zhang
Zhang
Zhao
Zhou
Zhu
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Synonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon–intron junctions (1–3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4–69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig

Crossref

IUPUIScholarWorks

Online Research @ Cardiff

Envelope Determinants of Equine Lentiviral Vaccine Protection

Lentiviral envelope (Env) antigenic variation and associated immune evasion present major obstacles to vaccine development. The concept that Env is a critical determinant for vaccine efficacy is well accepted, however defined correlates of protection associated with Env variation have yet to be determined. We reported an attenuated equine infectious anemia virus (EIAV) vaccine study that directly examined the effect of lentiviral Env sequence variation on vaccine efficacy. The study identified a significant, inverse, linear correlation between vaccine efficacy and increasing divergence of the challenge virus Env gp90 protein compared to the vaccine virus gp90. The report demonstrated approximately 100% protection of immunized ponies from disease after challenge by virus with a homologous gp90 (EV0), and roughly 40% protection against challenge by virus (EV13) with a gp90 13% divergent from the vaccine strain. In the current study we examine whether the protection observed when challenging with the EV0 strain could be conferred to animals via chimeric challenge viruses between the EV0 and EV13 strains, allowing for mapping of protection to specific Env sequences. Viruses containing the EV13 proviral backbone and selected domains of the EV0 gp90 were constructed and in vitro and in vivo infectivity examined. Vaccine efficacy studies indicated that homology between the vaccine strain gp90 and the N-terminus of the challenge strain gp90 was capable of inducing immunity that resulted in significantly lower levels of post-challenge virus and significantly delayed the onset of disease. However, a homologous N-terminal region alone inserted in the EV13 backbone could not impart the 100% protection observed with the EV0 strain. Data presented here denote the complicated and potentially contradictory relationship between in vitro virulence and in vivo pathogenicity. The study highlights the importance of structural conformation for immunogens and emphasizes the need for antibody binding, not neutralizing, assays that correlate with vaccine protection. © 2013 Craigo et al

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Kentucky

D-Scholarship@Pitt

parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.

Author: Cappelletti Luca
Castrignanò Tiziana
Danis Daniel
Frasca Marco
Grossi Giuliano
Mesiti Marco
Petrini Alessandro
Re Matteo
Robinson Peter N
Schubach Max
Valentini Giorgio
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2020
Field of study

BACKGROUND: Several prediction problems in computational biology and genomic medicine are characterized by both big data as well as a high imbalance between examples to be learned, whereby positive examples can represent a tiny minority with respect to negative examples. For instance, deleterious or pathogenic variants are overwhelmed by the sea of neutral variants in the non-coding regions of the genome: thus, the prediction of deleterious variants is a challenging, highly imbalanced classification problem, and classical prediction tools fail to detect the rare pathogenic examples among the huge amount of neutral variants or undergo severe restrictions in managing big genomic data. RESULTS: To overcome these limitations we propose parSMURF, a method that adopts a hyper-ensemble approach and oversampling and undersampling techniques to deal with imbalanced data, and parallel computational techniques to both manage big genomic data and substantially speed up the computation. The synergy between Bayesian optimization techniques and the parallel nature of parSMURF enables efficient and user-friendly automatic tuning of the hyper-parameters of the algorithm, and allows specific learning problems in genomic medicine to be easily fit. Moreover, by using MPI parallel and machine learning ensemble techniques, parSMURF can manage big data by partitioning them across the nodes of a high-performance computing cluster. Results with synthetic data and with single-nucleotide variants associated with Mendelian diseases and with genome-wide association study hits in the non-coding regions of the human genome, involhing millions of examples, show that parSMURF achieves state-of-the-art results and an 80-fold speed-up with respect to the sequential version. CONCLUSIONS: parSMURF is a parallel machine learning tool that can be trained to learn different genomic problems, and its multiple levels of parallelization and high scalability allow us to efficiently fit problems characterized by big and imbalanced genomic data. The C++ OpenMP multi-core version tailored to a single workstation and the C++ MPI/OpenMP hybrid multi-core and multi-node parSMURF version tailored to a High Performance Computing cluster are both available at https://github.com/AnacletoLAB/parSMURF

The Jackson Laboratory: The Mouseion at the JAXlibrary

Unitus DSpace

Prediction of driver variants in the cancer genome via machine learning methodologies

Author: Campbell I C G
Gaunt Tom R
Rogers Mark F
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2020
Field of study

Sequencing technologies have led to the identification of many variants in the human genome which could act as disease-drivers. As a consequence, a variety of bioinformatics tools have been proposed for predicting which variants may drive disease, and which may be causatively neutral. After briefly reviewing generic tools, we focus on a subset of these methods specifically geared toward predicting which variants in the human cancer genome may act as enablers of unregulated cell proliferation. We consider the resultant view of the cancer genome indicated by these predictors and discuss ways in which these types of prediction tools may be progressed by further research

PubMed Central

Explore Bristol Research

Doctor of Philosophy

Author: Crockett David K.
Publication venue: University of Utah
Publication date: 01/08/2011
Field of study

dissertationRapidly evolving technologies such as chip arrays and next-generation sequencing are uncovering human genetic variants at an unprecedented pace. Unfortunately, this ever growing collection of gene sequence variation has limited clinical utility without clear association to disease outcomes. As electronic medical records begin to incorporate genetic information, gene variant classification and accurate interpretation of gene test results plays a critical role in customizing patient therapy. To verify the functional impact of a given gene variant, laboratories rely on confirming evidence such as previous literature reports, patient history and disease segregation in a family. By definition variants of uncertain significance (VUS) lack this supporting evidence and in such cases, computational tools are often used to evaluate the predicted functional impact of a gene mutation. This study evaluates leveraging high quality genotype-phenotype disease variant data from 20 genes and 3986 variants, to develop gene-specific predictors utilizing a combination of changes in primary amino acid sequence, amino acid properties as descriptors of mutation severity and Naïve Bayes classification. A Primary Sequence Amino Acid Properties (PSAAP) prediction algorithm was then combined with well established predictors in a weighted Consensus sum in context of gene-specific reference intervals for known phenotypes. PSAAP and Consensus were also used to evaluate known variants of uncertain significance in the RET proto-oncogene as a model gene. The PSAAP algorithm was successfully extended to many genes and diseases. Gene-specific algorithms typically outperform generalized prediction tools. Characteristic mutation properties of a given gene and disease may be lost when diluted into genomewide data sets. A reliable computational phenotype classification framework with quantitative metrics and disease specific reference ranges allows objective evaluation of novel or uncertain gene variants and augments decision making when confirming clinical information is limited

The University of Utah: J. Willard Marriott Digital Library

Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

Author: Livesey Ben
Marsh Joseph
Publication venue: 'EMBO'
Publication date: 01/07/2020
Field of study

Abstract To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses

Crossref

Directory of Open Access Journals

Edinburgh Research Explorer