Search CORE

ScholarBank@NUS

Towards validating the hypothesis of phylogenetic profiling

Author: D Lin
EM Marcotte
J Handl
J Jäkel
J Seo
J Sun
J Sun
J Wu
M Pellegrini
Mazen Atwi
N Bolshakova
P Resnik
R Loganantharaj
Raja Loganantharaj
RL Tatusov
RL Tatusov
SF Altschul
SV Date
Publication venue: BioMed Central
Publication date: 01/11/2007
Field of study

Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABA(A) receptor subunit genes

Author: Aerts
Anand
Ballas
Ballas
Blackwood
Boris E. Shakhnovich
Bosman
Brooks-Kayal
Bussemaker
Charles DeLisi
Daniel S. Roberts
Dawson
Dolan
Friberg
Frith
Gray
Harbison
Iyer
Kaplan
Kerr
Kirkness
Kuo
Lawrence
Lee
Lewin
Li
Liu
Macisaac
MacIsaac
Madhani
Morozov
Niehrs
Pellegrini
Perier
Pietrokovski
Purves
Reddy
Roberts
Roberts
Roth
Saffer
Shelley J. Russek
Siegel
Steiger
Stormo
Stormo
Swendeman
Temple
Therrien
Thiagalingam
Thijs
Timothy E. Reddy
Tompa
Treiman
Wall
Wasserman
Winderickx
Wingender
Wu
Publication venue: Oxford University Press
Publication date: 03/01/2007
Field of study

Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A γ-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system

Boston University Institutional Repository (OpenBU)

False positive reduction in protein-protein interaction predictions using gene ontology annotations

Author: A Bairoch
A Valencia
C Alfarano
C von Mering
CM Deane
DR Rhodes
E Camon
E Camon
E Sprinzak
EM Marcotte
EM Marcotte
FM Couto
G Butland
GR Smith
H Wu
H Zhu
I Albert
IMA Nooren
J Janin
J Wu
J Yu
JBL Bard
JR Bock
JR Bock
K Tu
KV Brinda
L Lu
M Deng
M Hayashida
M Pellegrini
M Strong
Mahmoud A Mahdavi
O Carugo
P Bork
PW Lord
S Li
SL Lo
SV Date
T Dandekar
T Fujimori
T Yamada
The Gene Ontology Consortium
TR Hazbun
U Güldener
V van Noort
X Wu
XJ Zhou
Y Huang
Y Liu
Yen-Han Lin
Publication venue: BioMed Central
Publication date: 01/07/2007
Field of study

Abstract Background Many crucial cellular operations such as metabolism, signalling, and regulations are based on protein-protein interactions. However, the lack of robust protein-protein interaction information is a challenge. One reason for the lack of solid protein-protein interaction information is poor agreement between experimental findings and computational sets that, in turn, comes from huge false positive predictions in computational approaches. Reduction of false positive predictions and enhancing true positive fraction of computationally predicted protein-protein interaction datasets based on highly confident experimental results has not been adequately investigated. Results Gene Ontology (GO) annotations were used to reduce false positive protein-protein interactions (PPI) pairs resulting from computational predictions. Using experimentally obtained PPI pairs as a training dataset, eight top-ranking keywords were extracted from GO molecular function annotations. The sensitivity of these keywords is 64.21% in the yeast experimental dataset and 80.83% in the worm experimental dataset. The specificities, a measure of recovery power, of these keywords applied to four predicted PPI datasets for each studied organisms, are 48.32% and 46.49% (by average of four datasets) in yeast and worm, respectively. Based on eight top-ranking keywords and co-localization of interacting proteins a set of two knowledge rules were deduced and applied to remove false positive protein pairs. The '<it>strength</it>', a measure of improvement provided by the rules was defined based on the signal-to-noise ratio and implemented to measure the applicability of knowledge rules applying to the predicted PPI datasets. Depending on the employed PPI-predicting methods, the <it>strength </it>varies between two and ten-fold of randomly removing protein pairs from the datasets. Conclusion Gene Ontology annotations along with the deduced knowledge rules could be implemented to partially remove false predicted PPI pairs. Removal of false positives from predicted datasets increases the true positive fractions of the datasets and improves the robustness of predicted pairs as compared to random protein pairing, and eventually results in better overlap with experimental results.</p

Public Library of Science (PLOS)

Detection of Biochemical Pathways by Probabilistic Matching of Phyletic Vectors

Author: A Filloux
Arcady Mushegian
B Papp
B Snel
Berend Snel
C Brochier-Armanet
C Goh
C von Mering
David M. Kristensen
EV Koonin
G Glazko
G Glazko
GV Glazko
H Romero
Hua Li
J Wu
J Wu
JM Berg
JO Korbel
K Jim
K Liolios
LE Bingle
M Levesque
M Pellegrini
Michael K. Coleman
P Forterre
PR Kensche
R Jothi
R Liu
R Sadreyev
RL Tatusov
SF Altschul
SL Bardy
U Brandt
Publication venue: Public Library of Science
Publication date: 24/04/2009
Field of study

A phyletic vector, also known as a phyletic (or phylogenetic) pattern, is a binary representation of the presences and absences of orthologous genes in different genomes. Joint occurrence of two or more genes in many genomes results in closely similar binary vectors representing these genes, and this similarity between gene vectors may be used as a measure of functional association between genes. Better understanding of quantitative properties of gene co-occurrences is needed for systematic studies of gene function and evolution. We used the probabilistic iterative algorithm Psi-square to find groups of similar phyletic vectors. An extended Psi-square algorithm, in which pseudocounts are implemented, shows better sensitivity in identifying proteins with known functional links than our earlier hierarchical clustering approach. At the same time, the specificity of inferring functional associations between genes in prokaryotic genomes is strongly dependent on the pathway: phyletic vectors of the genes involved in energy metabolism and in de novo biosynthesis of the essential precursors tend to be lumped together, whereas cellular modules involved in secretion, motility, assembly of cell surfaces, biosynthesis of some coenzymes, and utilization of secondary carbon sources tend to be identified with much greater specificity. It appears that the network of gene coinheritance in prokaryotes contains a giant connected component that encompasses most biosynthetic subsystems, along with a series of more independent modules involved in cell interaction with the environment

Inferring modules of functionally interacting proteins using the Bond Energy Algorithm

Author: Morett Enrique
Vallejo Edgar E
Watanabe Ryosuke LA
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Non-homology based methods such as phylogenetic profiles are effective for predicting functional relationships between proteins with no considerable sequence or structure similarity. Those methods rely heavily on traditional similarity metrics defined on pairs of phylogenetic patterns. Proteins do not exclusively interact in pairs as the final biological function of a protein in the cellular context is often hold by a group of proteins. In order to accurately infer modules of functionally interacting proteins, the consideration of not only direct but also indirect relationships is required. In this paper, we used the Bond Energy Algorithm (<it>BEA</it>) to predict functionally related groups of proteins. With <it>BEA </it>we create clusters of phylogenetic profiles based on the associations of the surrounding elements of the analyzed data using a metric that considers linked relationships among elements in the data set. Results Using phylogenetic profiles obtained from the Cluster of Orthologous Groups of Proteins (<it>COG</it>) database, we conducted a series of clustering experiments using <it>BEA </it>to predict (upper level) relationships between profiles. We evaluated our results by comparing with <it>COG's </it>functional categories, And even more, with the experimentally determined functional relationships between proteins provided by the <it>DIP </it>and <it>ECOCYC </it>databases. Our results demonstrate that <it>BEA </it>is capable of predicting meaningful modules of functionally related proteins. <it>BEA </it>outperforms traditionally used clustering methods, such as <it>k</it>-means and hierarchical clustering by predicting functional relationships between proteins with higher accuracy. Conclusion This study shows that the linked relationships of phylogenetic profiles obtained by <it>BEA </it>is useful for detecting functional associations between profiles and extending functional modules not found by traditional methods. <it>BEA </it>is capable of detecting relationship among phylogenetic patterns by linking them through a common element shared in a group. Additionally, we discuss how the proposed method may become more powerful if other criteria to classify different levels of protein functional interactions, as gene neighborhood or protein fusion information, is provided.</p

Red Mexicana de Repositorios Institucionales

An improved method for identifying functionally linked proteins using phylogenetic profiles

Author: Cokus Shawn
Mizutani Sayaka
Pellegrini Matteo
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Comparative assessment of performance and genome dependence among phylogenetic profiling methods

Author: DeLisi Charles
Gustafson Adam M
Mellor Joseph
Snitkin Evan S
Wu Jie
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes. RESULTS: In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results. CONCLUSION: Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes

Boston University Institutional Repository (OpenBU)

Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling

Author: Gill Ryan T
Hunter Lawrence
Karimpour-Fard Anis
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The use of computational methods for predicting protein interaction networks will continue to grow with the number of fully sequenced genomes available. The Co-Conservation method, also known as the Phylogenetic profiles method, is a well-established computational tool for predicting functional relationships between proteins. Results Here, we examined how various aspects of this method affect the accuracy and topology of protein interaction networks. We have shown that the choice of reference genome influences the number of predictions involving proteins of previously unknown function, the accuracy of predicted interactions, and the topology of predicted interaction networks. We show that while such results are relatively insensitive to the <it>E</it>-value threshold used in defining homologs, predicted interactions are influenced by the similarity metric that is employed. We show that differences in predicted protein interactions are biologically meaningful, where judicious selection of reference genomes, or use of a new scoring scheme that explicitly considers reference genome relatedness, produces known protein interactions as well as predicted protein interactions involving coordinated biological processes that are not accessible using currently available databases. Conclusion These studies should prove valuable for future studies seeking to further improve phylogenetic profiling methodologies as well for efforts to efficiently employ such methods to develop new biological insights.</p

Public Library of Science (PLOS)

Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach

Author: A Enright
A Valencia
Alfonso Valencia
Beatriz García-Jiménez
C Alfarano
C Drummond
CM Bishop
Cv Mering
Cv Mering
D Juan
David Juan
DE Rumelhart
E Frank
E Morett
EA León
Eduardo Andrés-León
EM Marcotte
F Pazos
F Pazos
F Pazos
G Butland
GF Cooper
GH John
GI Webb
H Hermjakob
Iakes Ezkurdia
IH Witten
IM Keseler
J Wu
JG Cleary
L Breiman
L Salwinski
LJ Lu
M Arifuzzaman
M Kanehisa
M Pellegrini
M Sahami
N Friedman
P Bowers
R Hoffmann
RC Edgar
RR Bouckaert
SF Altschul
Shin-Han Shiu
T Dandekar
T Sato
Y Freund
Y Qi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: Molecular biology is currently facing the challenging task of functionally characterizing the proteome. The large number of possible protein-protein interactions and complexes, the variety of environmental conditions and cellular states in which these interactions can be reorganized, and the multiple ways in which a protein can influence the function of others, requires the development of experimental and computational approaches to analyze and predict functional associations between proteins as part of their activity in the interactome. Methodology/Principal Findings: We have studied the possibility of constructing a classifier in order to combine the output of the several protein interaction prediction methods. The AODE (Averaged One-Dependence Estimators) machine learning algorithm is a suitable choice in this case and it provides better results than the individual prediction methods, and it has better performances than other tested alternative methods in this experimental set up. To illustrate the potential use of this new AODE-based Predictor of Protein InterActions (APPIA), when analyzing high-throughput experimental data, we show how it helps to filter the results of published High-Throughput proteomic studies, ranking in a significant way functionally related pairs. Availability: All the predictions of the individual methods and of the combined APPIA predictor, together with the used datasets of functional associations are available at http://ecid.bioinfo.cnio.es/. Conclusions: We propose a strategy that integrates the main current computational techniques used to predict functional associations into a unified classifier system, specifically focusing on the evaluation of poorly characterized protein pairs. We selected the AODE classifier as the appropriate tool to perform this task. AODE is particularly useful to extract valuable information from large unbalanced and heterogeneous data sets. The combination of the information provided by five prediction interaction prediction methods with some simple sequence features in APPIA is useful in establishing reliability values and helpful to prioritize functional interactions that can be further experimentally characterized.This work was funded by the BioSapiens (grant number LSHG-CT-2003-503265) and the Experimental Network for Functional Integration (ENFIN) Networks of Excellence (contract number LSHG-CT-2005-518254), by Consolider BSC (grant number CSD2007-00050) and by the project “Functions for gene sets” from the Spanish Ministry of Education and Science (BIO2007-66855). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas