Search CORE

121 research outputs found

Clustering of cognate proteins among distinct proteomes derived from multiple links to a single seed sequence

Author: Barbosa-Silva Adriano
Ortega J Miguel
Satagopam Venkata P
Schneider Reinhard
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Modern proteomes evolved by modification of pre-existing ones. It is extremely important to comparative biology that related proteins be identified as members of the same cognate group, since a characterized putative homolog could be used to find clues about the function of uncharacterized proteins from the same group. Typically, databases of related proteins focus on those from completely-sequenced genomes. Unfortunately, relatively few organisms have had their genomes fully sequenced; accordingly, many proteins are ignored by the currently available databases of cognate proteins, despite the high amount of important genes that are functionally described only for these incomplete proteomes. Results We have developed a method to cluster cognate proteins from multiple organisms beginning with only one sequence, through connectivity saturation with that Seed sequence. We show that the generated clusters are in agreement with some other approaches based on full genome comparison. Conclusion The method produced results that are as reliable as those produced by conventional clustering approaches. Generating clusters based only on individual proteins of interest is less time consuming than generating clusters for whole proteomes. </p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Luxembourg

Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade

Author: A Alexeyenko
A Goffeau
A Kuzniar
A Rokas
AM Altenhoff
Antonis Rokas
B Dujon
BN Kent
C Dessimoz
C Vogel
Cecile Fairhead
CEV Storm
CEV Storm
CM Zmasek
CP Kurtzman
DA Fitzpatrick
DP Mindell
DP Wall
DP Wall
DR Scannell
EV Koonin
EV Koonin
EW Sayers
F Chen
F Chen
F Lemoine
FS Dietrich
I Wapinski
I Wapinski
J Ehrlich
JC Chiu
JL Gordon
K Liolios
KH Wolfe
KP Byrne
KP O'Brien
L Li
L Salichos
LA Mirny
LB Koski
Leonidas Salichos
M Kellis
M Remm
MP Cummings
O Akerborg
P Bork
P Bork
P Cliften
R Overbeek
RL Tatusov
RL Tatusov
RR Sokal
S Grossetete
SF Altschul
SF Altschul
T Hulsen
TF DeLuca
V van Noort
WM Fitch
Publication venue: Public Library of Science
Publication date: 13/04/2011
Field of study

RSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. of all algorithms dramatically increased in these traps.) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant

Public Library of Science (PLOS)

Crossref

PubMed Central

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Author: Altschul
Chen
D. N. Messina
Dolinski
E. L. L. Sonnhammer
G. Ostlund
Gabaldon
Heiges
Hulsen
K. Forslund
Kuzniar
Lassmann
O. Frings
S. Roopra
Sonnhammer
Sonnhammer
T. Kostler
T. Schmitt
Vanacova
Wang
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters

Crossref

Publikationer från Stockholms universitet

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MDC Repository

InParanoid 6: eukaryotic ortholog clusters with inparalogs

Author: A.-C. Berglund
Dessimoz
E. L. L. Sonnhammer
E. Sjolund
Fitch
G. Ostlund
Hulsen
O'Brien
Sonnhammer
Tatusov
Publication venue: Oxford University Press
Publication date
Field of study

The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available ‘complete’ eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters

Crossref

PubMed Central

Proteinortho: Detection of (Co-)orthologs in large-scale analysis

Author: A Alexeyenko
A Force
A Nakabachi
A Schneider
AE Hirsh
AJ Enright
C Lanczos
D Cornaz
DM Kristensen
E Pruesse
EV Koonin
IK Jordan
J Hopcroft
JP McCutcheon
L Li
Lydia Steiner
M Fiedler
M Fiedler
M Remm
M Sikdar
Manja Marz
Marcus Lechner
MC Rivera
P Bork
Peter F Stadler
RL Tatusov
S Guattery
SM van Dongen
Sonja J Prohaska
Sven Findeiß
TJ Hubbard
WM Fitch
Z Fu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions <monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Fraunhofer-ePrints

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Inferring Hierarchical Orthologous Groups

Author: Train Clément
Publication venue: Université de Lausanne, Faculté de biologie et médecine
Publication date: 01/01/2019
Field of study

The reconstruction of ancestral evolutionary histories is the cornerstone of most phylogenetic analyses. Many applications are possible once the evolutionary history is unveiled, such as identifying taxonomically restricted genes (genome barcoding), predicting the function of unknown genes based on their evolutionary related genes gene ontologies, identifying gene losses and gene gains among gene families, or pinpointing the time in evolution where particular gene families emerge (sometimes referred to as “phylostratigraphy”). Typically, the reconstruction of the evolutionary histories is limited to the inference of evolutionary relationships (homology, orthology, paralogy) and basic clustering of these orthologs. In this thesis, we adopted the concept of Hierarchical Orthology Groups (HOGs), introduced a decade ago, and proposed several improvements both to improve their inference and to use them in biological analyses such as the aforementioned applications. In addition, HOGs are a powerful framework to investigate ancestral genomes since HOGs convey information regarding gene family evolution (gene losses, gene duplications or gene gains). In this thesis, an ancestral genome at a given taxonomic level denotes the last common ancestor genome for the related taxon and its hypothetical ancestral gene composition and gene order (synteny). The ancestral genes composition and ancestral synteny for a given ancestral genome provides valuable information to study the genome evolution in terms of genomic rearrangement (duplication, translocation, deletion, inversion) or of gene family evolution (variation of the gene function, accelerate gene evolution, duplication rich clade). This thesis identifies three major open challenges that composed my three research arcs. First, inferring HOGs is complex and computationally demanding meaning that robust and scalable algorithms are mandatory to generate good quality HOGs in a reasonable time. Second, benchmarking orthology clustering without knowing the true evolutionary history is a difficult task, which requires appropriate benchmark strategies. And third, the lack of tools to handle HOGs limits their applications. In the first arc of the thesis, I proposed two new algorithm refinements to improve orthology inference in order to produce orthologs less sensitive to gene fragmentations and imbalances in the rate of evolution among paralogous copies. In addition, I introduced version 2.0 of the GETHOGs 2.0 algorithm, which infers HOGs in a bottom up fashion, and which has been shown to be both faster and more accurate. In the second arc, I proposed new strategies to benchmark the reconstruction of gene families using detailed cases studies based on evidence from multiple sequence alignments along with reconstructed gene trees, and to benchmark orthology using a simulation framework that provides full control of the evolutionary genomic setup. This work highlights the main challenges in current methods. Third, I created pyHam (python HOG analysis method), iHam (interactive HOG analysis method) and GTM (Graph - Tree - Multiple sequence alignment)—a collection of tools to process, manipulate and visualise HOGs. pyHam offers an easy way to handle and work with HOGs using simple python coding. Embedded at its heart are two visualisation tools to synthesise HOG-derived information: iHam that allow interactive browsing of HOG structure and a tree based visualisation called tree profile that pinpoints evolutionary events induced by the HOGs on a species tree. In addition, I develop GTM an interactive web based visualisation tool that combine for a given gene family (or set of genes) the related sequences, gene tree and orthology graph. In this thesis, I show that HOGs are a useful framework for phylogenetics, with considerable work done to produce robust and scalable inferences. Another important aspect is that our inferences are benchmarked using manual case studies and automated verification using simulation or reference Quest for Orthologs Benchmarks. Lastly, one of the major advances was the conception and implementation of tools to manipulate and visualise HOG. Such tools have already proven useful when investigating HOGs for developmental reasons or for downstream analysis. Ultimately, the HOG framework is amenable to integration of all aspects which can reasonably be expected to have evolved along the history of genes and ancestral genome reconstruction. -- La reconstruction de l'histoire évolutive ancestrale est la pierre angulaire de la majorité des analyses phylogénétiques. Nombreuses sont les applications possibles une fois que l'histoire évolutive est révélée, comme l'identification de gènes restreints taxonomiquement (barcoding de génome), la prédiction de fonction pour les gènes inconnus en se basant sur les ontologies des gènes relatifs evolutionnairement, l'identification de la perte ou de l'apparition de gènes au sein de familles de gènes ou encore pour dater au cours de l'évolution l'apparition de famille de gènes (phylostratigraphie). Généralement, la reconstruction de l'histoire évolutive se limite à l'inférence des relations évolutives (homologie, orthologie, paralogie) ainsi qu'à la construction de groupes d’orthologues simples. Dans cette thèse, nous adoptons le concept des groupes hiérarchiques d’orthologues (HOGs en anglais pour Hierarchical Orthology Groups), introduit il y a plus de 10 ans, et proposons plusieurs améliorations tant bien au niveau de leurs inférences que de leurs utilisations dans les analyses biologiques susmentionnées. Cette thèse a pour but d'identifier les trois problématiques majeures qui composent mes trois axes de recherches. Premièrement, l'inférence des HOGs est complexe et nécessite une puissance computationnelle importante ce qui rend obligatoire la création d'algorithmes robustes et efficients dans l'espace temps afin de maintenir une génération de résultats de qualité rigoureuse dans un temps raisonnable. Deuxièmement, le contrôle de la qualité du groupement des orthologues est une tâche difficile si on ne connaît l'histoire évolutive réelle ce qui nécessite la mise en place de stratégies de contrôle de qualité adaptées. Tertio, le manque d'outils pour manipuler les HOGs limite leur utilisation ainsi que leurs applications. Dans le premier axe de ma thèse, je propose deux nouvelles améliorations de l'algorithme pour l'inférence des orthologues afin de pallier à la sensibilité de l'inférence vis à vis de la fragmentation des gènes et de l'asymétrie du taux d'évolution au sein de paralogues. De plus, j'introduis la version 2.0 de l'algorithme GETHOGs qui utilise une nouvelle approche de type 'bottom-up' afin de produire des résultats plus rapides et plus précis. Dans le second axe, je propose de nouvelles stratégies pour contrôler la qualité de la reconstruction des familles de gènes en réalisant des études de cas manuels fondés sur des preuves apportées par des alignement multiples de séquences et des reconstructions d'arbres géniques, et aussi pour contrôler la qualité de l'orthologie en simulant l'évolution de génomes afin de pouvoir contrôler totalement le matériel génétique produit. Ce travail met en avant les principales problématiques des méthodes actuelles. Dans le dernier axe, je montre pyHam, iHam et GTM - une panoplie d'outils que j’ai créée afin de faciliter la manipulation et la visualisation des HOGs en utilisant un programmation simple en python. Deux outils de visualisation sont directement intégrés au sein de pyHam afin de pouvoir synthétiser l'information véhiculée par les HOGs: iHam permet d’interactivement naviguer dans les HOGs ainsi qu’une autre visualisation appelée “tree profile” utilisant un arbre d'espèces où sont localisés les événements révolutionnaires contenus dans les HOGs. En sus, j'ai développé GTM un outil interactif web qui combine pour une famille de gènes donnée (ou un ensemble de gènes) leurs séquences alignées, leur arbre de gène ainsi que le graphe d'orthologie en relation. Dans cette thèse, je montre que le concept des HOGs est utile à la phylogénétique et qu'un travail considérable a été réalisé dans le but d'améliorer leur inférences de façon robuste et rapide. Un autre point important est que la qualité de nos inférences soit contrôlée en réalisant des études de cas manuellement ou en utilisant le Quest for Orthologs Benchmark qui est une référence dans le contrôle de la qualité de l’orthologie. Dernièrement, une des avancée majeure proposée est la conception et l'implémentation d'outils pour visualiser et manipuler les HOGs. Ces outils s'avèrent déjà utilisés tant pour l'étude des HOGs dans un but d'amélioration de leur qualité que pour leur utilisation dans des analyses biologiques. Pour conclure, on peut noter que tous les aspects qui semblent avoir évolué en relation avec l'histoire évolutive des gènes ou des génomes ancestraux peuvent être intégrés au concept des HOGs

Serveur académique lausannois

EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data

Author: Linard Benjamin
Nguyen Ngoc Hoan
Poch Olivier
Prosdocimi Francisco
Thompson Julie D.
Publication venue: Libertas Academica
Publication date: 01/12/2011
Field of study

Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes

CiteSeerX

Crossref

HAL-Inserm

Directory of Open Access Journals

PubMed Central

MultiMSOAR 2.0: An Accurate Tool to Identify Ortholog Groups among Multiple Genomes

Author: A Alexeyenko
A Vashist
A Vilella
AC Berglund
AE Ivliev
AJ Enright
B Vernot
D Durand
DL Wheeler
F Boyer
G Shi
Guanqun Shi
H Li
HW Kuhn
J Huerta-Cepas
L Goodstadt
L Li
M Remm
M Semon
Meng-Chih Peng
MV Han
P Pevzner
RL Tatusov
S Hannenhalli
SF Altschul
Stein Aerts
Tao Jiang
TF DeLuca
V Kann
V Shoja
WJ Kent
WM Fitch
YP Deniélou
Z Fu
Z Fu
Z Jiang
Z Yang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The identification of orthologous genes shared by multiple genomes plays an important role in evolutionary studies and gene functional analyses. Based on a recently developed accurate tool, called MSOAR 2.0, for ortholog assignment between a pair of closely related genomes based on genome rearrangement, we present a new system MultiMSOAR 2.0, to identify ortholog groups among multiple genomes in this paper. In the system, we construct gene families for all the genomes using sequence similarity search and clustering, run MSOAR 2.0 for all pairs of genomes to obtain the pairwise orthology relationship, and partition each gene family into a set of disjoint sets of orthologous genes (called super ortholog groups or SOGs) such that each SOG contains at most one gene from each genome. For each such SOG, we label the leaves of the species tree using 1 or 0 to indicate if the SOG contains a gene from the corresponding species or not. The resulting tree is called a tree of ortholog groups (or TOGs). We then label the internal nodes of each TOG based on the parsimony principle and some biological constraints. Ortholog groups are finally identified from each fully labeled TOG. In comparison with a popular tool MultiParanoid on simulated data, MultiMSOAR 2.0 shows significantly higher prediction accuracy. It also outperforms MultiParanoid, the Roundup multi-ortholog repository and the Ensembl ortholog database in real data experiments using gene symbols as a validation tool. In addition to ortholog group identification, MultiMSOAR 2.0 also provides information about gene births, duplications and losses in evolution, which may be of independent biological interest. Our experiments on simulated data demonstrate that MultiMSOAR 2.0 is able to infer these evolutionary events much more accurately than a well-known software tool Notung. The software MultiMSOAR 2.0 is available to the public for free

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals

Author: A Alexeyenko
A Kuzniar
AI Su
AJ Vilella
AM Schnoes
Andrey Rzhetsky
B Rost
B Sennblad
BE Engelhardt
BY Liao
BY Liao
CB Bridges
CB Bridges
CL McGrath
CM Zmasek
D Lee
DL Des Marais
DM Martin
E Zuckerkandl
ELL Sonnhammer
EV Koonin
G Glazko
G Shi
HJ Muller
JA Eisen
JA Tennessen
K Dolinski
KD Makova
L Huminiecki
M Goodman
M Kimura
M Lynch
Matthew W. Hahn
MV Han
MV Han
MW Hahn
N Goldman
Nathan L. Nehrt
P Katz
P Radivojac
Predrag Radivojac
R Rentzsch
RA Studer
RA Studer
RD Chen
RL Tatusov
RS Datta
S Addou
S Mika
S Ohno
SG Stephens
T Gabaldon
T Gabaldon
T Hawkins
T Hulsen
W-H Li
WH Li
WM Fitch
WM Fitch
Wyatt T. Clark
ZD Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

Author: Hernandez Ryan D.
Maher M. Cyrus
Publication venue
Publication date: 18/04/2014
Field of study

Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

arXiv.org e-Print Archive

FigShare