Search CORE

2,283 research outputs found

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology

Author: A Politi
B Alberts
C Leslie
CH Wu
CS Yu
E Friedberg
E Friedberg
E Sonoda
H Interthal
H Klein
H Shen
H Shen
HM Berman
I Miller
I Vergara
J Brown
J Cheng
J Demšar
J Shawe-Taylor
JB Brown
K Chou
K Chou
K Chou
K Fujishima
K Nitiss
K Takemoto
K-J Park
L Mariño-Ramírez
L Wen
M Bhasin
M Bhasin
M Kasahara
N Cristianini
N Dong
P Jowsey
R Wood
S Johnson
SF Altschul
T Dietterich
T Hubbard
T Jaakkola
T Joachims
Tatsuya Akutsu
The Gene Ontology Consortium
TK Hazra
TS Dexheimer
U Dery
W Ewens
W Li
Y El-Manzalawy
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background DNA repair is the general term for the collection of critical mechanisms which repair many forms of DNA damage such as methylation or ionizing radiation. DNA repair has mainly been studied in experimental and clinical situations, and relatively few information-based approaches to new extracting DNA repair knowledge exist. As a first step, automatic detection of DNA repair proteins in genomes via informatics techniques is desirable; however, there are many forms of DNA repair and it is not a straightforward process to identify and classify repair proteins with a single optimal method. We perform a study of the ability of homology and machine learning-based methods to identify and classify DNA repair proteins, as well as scan vertebrate genomes for the presence of novel repair proteins. Combinations of primary sequence polypeptide frequency, secondary structure, and homology information are used as feature information for input to a Support Vector Machine (SVM). Results We identify that SVM techniques are capable of identifying portions of DNA repair protein datasets without admitting false positives; at low levels of false positive tolerance, homology can also identify and classify proteins with good performance. Secondary structure information provides improved performance compared to using primary structure alone. Furthermore, we observe that machine learning methods incorporating homology information perform best when data is filtered by some clustering technique. Analysis by applying these methodologies to the scanning of multiple vertebrate genomes confirms a positive correlation between the size of a genome and the number of DNA repair protein transcripts it is likely to contain, and simultaneously suggests that all organisms have a non-zero minimum number of repair genes. In addition, the scan result clusters several organisms' repair abilities in an evolutionarily consistent fashion. Analysis also identifies several functionally unconfirmed proteins that are highly likely to be involved in the repair process. A new web service, INTREPED, has been made available for the immediate search and annotation of DNA repair proteins in newly sequenced genomes. Conclusion Despite complexity due to a multitude of repair pathways, combinations of sequence, structure, and homology with Support Vector Machines offer good methods in addition to existing homology searches for DNA repair protein identification and functional annotation. Most importantly, this study has uncovered relationships between the size of a genome and a genome's available repair repetoire, and offers a number of new predictions as well as a prediction service, both which reduce the search time and cost for novel repair genes and proteins.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Applications of Support Vector Machines in Bioinformatics and Network Security

Author: Rehan Akbani
Turgay Korkmaz
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen

Exploiting structural and topological information to improve prediction of RNA-protein binding sites

Author: A Bradley
A del Sol
A del Sol
CW Cheng
E Jeong
E Jeong
G Amitai
H Tjong
HR Guy
I Selin
IH Witten
J Allersa
L Wang
M Kumar
M Terribilini
M Terribilini
OT Kim
P Baldi
R Spriggs
RP Bahadur
S Altschul
S Kawashima
S Shazman
S Tanaka
Stefan R Maetschke
T Fawcett
T Kamada
W Kabsch
WH Press
Y Chen
Zheng Yuan
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

The breast and ovarian cancer susceptibility gene BRCA1 encodes a multifunctional tumor suppressor protein BRCA1, which is involved in regulating cellular processes such as cell cycle, transcription, DNA repair, DNA damage response and chromatin remodeling. BRCA1 protein, located primarily in cell nuclei, interacts with multiple proteins and various DNA targets. It has been demonstrated that BRCA1 protein binds to damaged DNA and plays a role in the transcriptional regulation of downstream target genes. As a key protein in the repair of DNA double-strand breaks, the BRCA1-DNA binding properties, however, have not been reported in detail

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

Machine learning for regulatory analysis and transcription factor target prediction in yeast

Author: A Gasch
A Zien
AG Hinnebusch
B Balasubramanian
B Pina
C Harbison
CH Choi
Charles DeLisi
CJ Benham
CJ Benham
CS Leslie
D Goodsell
DE Martin
DL Wheeler
Dustin T. Holloway
E Birney
EM Conlon
F Baldino
G Lanckriet
GD Stormo
H Bussemaker
H Mountain
H Yu
I Guyon
IT Lee
J Helden van
J Helden van
J Helden van
J Ihmels
J Ihmels
J Mellor
J Qian
J Wu
JE Galagan
K Birnbaum
KJ Breslauer
KM Masters
KR Christie
M Kellis
M Pritsker
M Tompa
M Wang
MA Beer
Mark Kon
N Simonis
NA Kent
P Haverty
P Pavlidis
PF Cliften
RA Flickinger
S Aerts
S Elemento
S Hua
S Hua
S Keles
S Mangan
S Satchwell
SJ Deminoff
T Acton
T Schneider
TD Schneider
TD Tullius
TS Furey
V Matys
W Wang
X-F Zheng
Z Zhu
Publication venue: Kluwer Academic Publishers
Publication date: 01/01/2006
Field of study

High throughput technologies, including array-based chromatin immunoprecipitation, have rapidly increased our knowledge of transcriptional maps—the identity and location of regulatory binding sites within genomes. Still, the full identification of sites, even in lower eukaryotes, remains largely incomplete. In this paper we develop a supervised learning approach to site identification using support vector machines (SVMs) to combine 26 different data types. A comparison with the standard approach to site identification using position specific scoring matrices (PSSMs) for a set of 104 Saccharomyces cerevisiae regulators indicates that our SVM-based target classification is more sensitive (73 vs. 20%) when specificity and positive predictive value are the same. We have applied our SVM classifier for each transcriptional regulator to all promoters in the yeast genome to obtain thousands of new targets, which are currently being analyzed and refined to limit the risk of classifier over-fitting. For the purpose of illustration we discuss several results, including biochemical pathway predictions for Gcn4 and Rap1. For both transcription factors SVM predictions match well with the known biology of control mechanisms, and possible new roles for these factors are suggested, such as a function for Rap1 in regulating fermentative growth. We also examine the promoter melting temperature curves for the targets of YJR060W, and show that targets of this TF have potentially unique physical properties which distinguish them from other genes. The SVM output automatically provides the means to rank dataset features to identify important biological elements. We use this property to rank classifying k-mers, thereby reconstructing known binding sites for several TFs, and to rank expression experiments, determining the conditions under which Fhl1, the factor responsible for expression of ribosomal protein genes, is active. We can see that targets of Fhl1 are differentially expressed in the chosen conditions as compared to the expression of average and negative set genes. SVM-based classifiers provide a robust framework for analysis of regulatory networks. Processing of classifier outputs can provide high quality predictions and biological insight into functions of particular transcription factors. Future work on this method will focus on increasing the accuracy and quality of predictions using feature reduction and clustering strategies. Since predictions have been made on only 104 TFs in yeast, new classifiers will be built for the remaining 100 factors which have available binding data

Crossref

Boston University Institutional Repository (OpenBU)

Springer - Publisher Connector

PubMed Central

A Comprehensive Analysis of MALDI-TOF Spectrometry Data

Author: Malgorzata Plechawska-Wojcik
Publication venue: 'IntechOpen'
Publication date: 09/03/2012
Field of study

IntechOpen

Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

Author: A Ben-Hur
A Jaimovich
AK McCallum
AP Gasch
B Raghavachari
BA Shoemaker
BA Shoemaker
C Stark
Dafydd Jones
Diego Martinez
E Quevillon
E Quevillon
E Sprinzak
H Huang
H Yu
Harriett Platero
I Lee
IH Witten
J Qiu
J Sun
JH Fong
K Nigam
M Ashburner
M Mahdavi
M Werner-Washburne
Margaret Werner-Washburne
MG Kann
N Simonis
NJ Krogan
P Langley
P Uetz
R Jansen
R Jothi
R Li
S Li
S Martin
SM Gomez
Sushmita Roy
T Ito
T Joachims
TA Lasko
Terran Lane
Y Liu
Y Ofran
Publication venue: Public Library of Science
Publication date: 01/11/2009
Field of study

Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information.AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins.AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central