Search CORE

332 research outputs found

Digging into acceptor splice site prediction : an iterative feature selection approach

Author: A.I. Blum
A.K. Jain
C. Mathé
D. Mladenić
E. Alpaydin
G.R. Harik
H. Mühlenbein
I. Guyon
I. Guyon
J. Weston
M. Kudo
M. Pertea
P. Larrañaga
R. Kohavi
R.O. Duda
S. Degroeve
T. Joachims
X. Zhang
Y. Saeys
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Feature selection techniques are often used to reduce data dimensionality, increase classification performance, and gain insight into the processes that generated the data. In this paper, we describe an iterative procedure of feature selection and feature construction steps, improving the classification of acceptor splice sites, an important subtask of gene prediction. We show that acceptor prediction can benefit from feature selection, and describe how feature selection techniques can be used to gain new insights in the classification of acceptor sites. This is illustrated by the identification of a new, biologically motivated feature: the AG-scanning feature. The results described in this paper contribute both to the domain of gene prediction, and to research in feature selection techniques, describing a new wrapper based feature weighting method that aids in knowledge discovery when dealing with complex datasets

Crossref

Ghent University Academic Bibliography

Accurate splice site prediction using support vector machines

Author: Bmc Bioinformatics
Gabriele Schweikert
Gunnar Rätsch
Jonas Behr
Jonas Behr
Petra Philips
Petra Philips
Sören Sonnenburg
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Proceeding

CiteSeerX

Crossref

Springer - Publisher Connector

Fraunhofer-ePrints

PubMed Central

MPG.PuRe

Prediction of donor splice sites using random forest with a new sequence encoding approach

Author
Publication venue: BioMed Central
Publication date: 22/01/2016
Field of study

Springer - Publisher Connector

Fast splice site detection using information content and feature reduction

Author: AKMA Baten
AKMA Baten
BCH Chang
C Burge
C Burge
C Cortes
CE Shannon
D Cai
G Dror
G Ratsch
G Yeo
H Drucker
H Itoh
H Liu
JCaHLS Rajapakse
JSaRD Chuang
L Zhang
M Burset
M Pertea
M Zhang
MB Shapiro
MG Reese
MG Reese
N Cristianini
P Waddell
R Castelo
S Brunak
S Buckingham
S Degroeve
S Salzberg
S Sonnenburg
S Sonnenburg
S Washietl
SA Marashi
SK Halgamuge
SM Hebsgaard
T Golub
T-M Chen
TD Schneider
v Vapnik
XH-F Zhang
Y Saeys
YF Sun
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: Accurate identification of splice sites in DNA sequences plays a key role in the prediction of gene structure in eukaryotes. Already many computational methods have been proposed for the detection of splice sites and some of them showed high prediction accuracy. However, most of these methods are limited in terms of their long computation time when applied to whole genome sequence data. Results: In this paper we propose a hybrid algorithm which combines several effective and informative input features with the state of the art support vector machine (SVM). To obtain the input features we employ information content method based on Shannon\u27s information theory, Shapiro\u27s score scheme, and Markovian probabilities. We also use a feature elimination scheme to reduce the less informative features from the input data. Conclusion: In this study we propose a new feature based splice site detection method that shows improved acceptor and donor splice site detection in DNA sequences when the performance is compared with various state of the art and well known method

ePublications@SCU

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Support Vector Machines and Kernels for Computational Biology

ISSN:1553-734XISSN:1553-735

Repository for Publications and Research Data

Crossref

Fraunhofer-ePrints

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Prediction of donor splice sites using random forest with a new sequence encoding approach

Author: A Baten
A Dehzangi
A Liaw
A Zien
Atmakuri Ramakrishna Rao
BJ Blencowe
BJ Lam
C Bergmeir
C Burge
C Cortes
C Weihs
D Hand
D Meyer
G Yeo
H Drucker
J Huang
J Rajapakse
J Zhu
JL Li
L Breiman
M Khalilia
M Pertea
M Stone
MG Reese
MM Yin
MQ Zhang
N Sheth
P Jain
P Pollastro
Prabina Kumar Meher
R Staden
S Haykin
S Sören Sonnenburg
SE Hamby
T Mitchell
Tanmaya Kumar Sahu
TM Chen
WN Venables
X Roca
X Zhao
XF Zhang
Z Dominski
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Kernel methods in genomics and computational biology

Author: Vert Jean-Philippe
Publication venue
Publication date: 17/10/2005
Field of study

Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

arXiv.org e-Print Archive

HAL-MINES ParisTech

Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands

Author: A Barski
A Bird
A Henckel
A Jeltsch
A Meissner
A Siepel
AH Ting
Andreas Zell
AP Bird
B Rhead
BE Bernstein
BE Bernstein
Brock C. Christensen
C Bock
C Bock
C Bock
C Previti
C Wrzodek
CC Chang
CD Bustos
Clemens Wrzodek
D Jia
D Takai
D Zilberman
DE Schones
E Schilling
EJ Gardiner
ES Lander
F Antequera
F Antequera
F Eckhardt
F Fang
F Fuks
F Mohn
FA Feltus
Finja Büchel
Florian Mittag
GD Stormo
Georg Hinselmann
H Cedar
H Vikas
JF Costello
JG Cleary
Johannes Eichner
JT Bell
KL Thu
M Burset
M Esteller
M Esteller
M Gardiner-Garden
M Hall
M Oka
P Baldi
P Dehan
P Hajkova
PA Jones
R Das
R Fan
R Lister
RA Rollins
RM Brena
RM Brena
S Aerts
S Fan
S Kim
S Kochanek
SE Celniker
SKT Ooi
W Reik
WJ Kent
Y Wang
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universität Tübingen

Feature selection for splice site prediction: A new method using EDA-based feature ranking

Author: Aeyels Dirk
Degroeve Sven
Rouzé Pierre
Saeys Yvan
Van de Peer Yves
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features

Springer - Publisher Connector

Directory of Open Access Journals

Ghent University Academic Bibliography

PubMed Central