Search CORE

1,242 research outputs found

Prediction of donor splice sites using random forest with a new sequence encoding approach

Author
Publication venue: BioMed Central
Publication date: 22/01/2016
Field of study

Improved identification of conserved cassette exons using Bayesian networks

Author: Backofen Rolf
Gausmann Ulrike
Hiller Michael
Platzer Matthias
Pudimat Rainer
Sinha Rileen
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence <it>in silico </it>methods of predicting alternative splicing have to be improved. Results Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. Conclusion BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Prediction of donor splice sites using random forest with a new sequence encoding approach

Author: A Baten
A Dehzangi
A Liaw
A Zien
Atmakuri Ramakrishna Rao
BJ Blencowe
BJ Lam
C Bergmeir
C Burge
C Cortes
C Weihs
D Hand
D Meyer
G Yeo
H Drucker
J Huang
J Rajapakse
J Zhu
JL Li
L Breiman
M Khalilia
M Pertea
M Stone
MG Reese
MM Yin
MQ Zhang
N Sheth
P Jain
P Pollastro
Prabina Kumar Meher
R Staden
S Haykin
S Sören Sonnenburg
SE Hamby
T Mitchell
Tanmaya Kumar Sahu
TM Chen
WN Venables
X Roca
X Zhao
XF Zhang
Z Dominski
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Estimating Dependency Structure as a Hidden Variable

Author: Jordan Michael I.
Meila Marina
Morris Quaid
Publication venue
Publication date: 01/01/1997
Field of study

This paper introduces a probability model, the mixture of trees that can account for sparse, dynamically changing dependence relationships. We present a family of efficient algorithms that use EM and the Minimum Spanning Tree algorithm to find the ML and MAP mixture of trees for a variety of priors, including the Dirichlet and the MDL priors. We also show that the single tree classifier acts like an implicit feature selector, thus making the classification performance insensitive to irrelevant attributes. Experimental results demonstrate the excellent performance of the new model both in density estimation and in classification

CiteSeerX

DSpace@MIT

Accurate prediction of NAGNAG alternative splicing

Author: Akerman
Akerman
Atkinson
Beaumont
Blencowe
Chern
Coolidge
de la Grange
Fayyad
Fox-Walsh
Friedman
Hiller
Hiller
Hiller
Hollins
Johnson
Karol Szafranski
Klaus Huse
Ling
Matthias Platzer
Michael Hiller
Needham
Niels Jahn
Nikolajewa
Pudil
Pudimat
Rileen Sinha
Rolf Backofen
Schindler
Sugnet
Swetlana Nikolajewa
Szafranski
Tadokoro
Tress
Tsai
Tsai
Tsai
Wang
Witten
Yeo
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of ≥92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the ‘NAGNAG-splicing code’ within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond

Crossref

PubMed Central

Apples and oranges: avoiding different priors in Bayesian DNA sequence analysis

Author: A Bernal
A Culotta
A Feelders
AE Kel
AL Berger
AY Ng
C Burge
CM Bishop
D Cai
D Grossman
D Heckerman
D Klein
E Redhead
E Segal
F Pernkopf
G Yeo
GD Stormo
H Wallach
H Wettig
HE Peckham
I Ben-Gal
Ivo Grosse
J Cerquides
J Davis
J Goodman
J Grau
J Keilwagen
Jan Grau
Jens Keilwagen
L Narlikar
M Arita
M Meila-Predoviciu
M Tompa
M Zhang
MI Jordan
NK Kim
O Schulte
O Yakhnenko
P Grünwald
R Castelo
R Castelo
R Greiner
R Staden
S Chen
S Sonnenburg
SL Salzberg
Stefan Posch
T Fawcett
TH Kim
TM Chen
WL Buntine
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of the challenges of bioinformatics remains the recognition of short signal sequences in genomic DNA such as donor or acceptor splice sites, splicing enhancers or silencers, translation initiation sites, transcription start sites, transcription factor binding sites, nucleosome binding sites, miRNA binding sites, or insulator binding sites. During the last decade, a wealth of algorithms for the recognition of such DNA sequences has been developed and compared with the goal of improving their performance and to deepen our understanding of the underlying cellular processes. Most of these algorithms are based on statistical models belonging to the family of Markov random fields such as position weight matrix models, weight array matrix models, Markov models of higher order, or moral Bayesian networks. While in many comparative studies different learning principles or different statistical models have been compared, the influence of choosing different prior distributions for the model parameters when using different learning principles has been overlooked, and possibly lead to questionable conclusions. Results With the goal of allowing direct comparisons of different learning principles for models from the family of Markov random fields based on the <it>same a-priori information</it>, we derive a generalization of the commonly-used product-Dirichlet prior. We find that the derived prior behaves like a Gaussian prior close to the maximum and like a Laplace prior in the far tails. In two case studies, we illustrate the utility of the derived prior for a direct comparison of different learning principles with different models for the recognition of binding sites of the transcription factor Sp1 and human donor splice sites. Conclusions We find that comparisons of different learning principles using the same a-priori information can lead to conclusions different from those of previous studies in which the effect resulting from different priors has been neglected. We implement the derived prior is implemented in the open-source library Jstacs to enable an easy application to comparative studies of different learning principles in the field of sequence analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central