Search CORE

768 research outputs found

TreeKO: a duplication-aware algorithm for the comparison of phylogenetic trees

Author: Alberich
Altenhoff
Bansal
Beiko
Bille
Bininda-Emonds
Bordewich
Burleigh
Carstens
Castresana
Creevey
de Vienne
Degnan
Estabrook
Felsenstein
Fitzpatrick
Gabaldón
Gabaldón
Goodman
Gordon
Gouret
Huerta-Cepas
Huerta-Cepas
Ma
Marcet-Houben
Marina Marcet-Houben
Page
Puigbo
Rasmussen
Retief
Robinson
Sicheritz-Ponten
Soria-Carrasco
Toni Gabaldón
Valiente
Wang
Wehe
Wu
Publication venue: Oxford University Press
Publication date
Field of study

Comparisons of tree topologies provide relevant information in evolutionary studies. Most existing methods share the drawback of requiring a complete and exact mapping of terminal nodes between the compared trees. This severely limits the scope of genome-wide analyses, since trees containing duplications are pruned arbitrarily or discarded. To overcome this, we have developed treeKO, an algorithm that enables the comparison of tree topologies, even in the presence of duplication and loss events. To do so treeKO recursively splits gene trees into pruned trees containing only orthologs to subsequently compute a distance based on the combined analyses of all pruned tree comparisons. In addition treeKO, implements the possibility of computing phylome support values, and reconciliation-based measures such as the number of inferred duplication and loss events

Crossref

PubMed Central

Phylogenetics from paralogs

Author: Hellmuth Marc
Lechner Markus
Lenhof Hans-Peter
Middendorf Martin
Stadler Peter F.
Wieseke Nicolas
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2014
Field of study

Motivation: Sequence-based phylogenetic approaches heavily rely on initial data sets to be composed of orthologous sequences only. Paralogs are treated as a dangerous nuisance that has to be detected and removed. Recent advances in mathematical phylogenetics, however, have indicated that gene duplications can also convey meaningful phylogenetic information provided orthologs and paralogs can be distinguished with a degree of certainty. Results: We demonstrate that plausible phylogenetic trees can be inferred from paralogy information only. To this end, tree-free estimates of orthology, the complement of paralogy, are first corrected to conform cographs and then translated into equivalent event-labeled gene phylogenies. A certain subset of the triples displayed by these trees translates into constraints on the species trees. While the resolution is very poor for individual gene families, we observe that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees of several groups of eubacteria. The novel method introduced here relies on solving three intertwined NP-hard optimization problems: the cograph editing problem, the maximum consistent triple set problem, and the least resolved tree problem. Implemented as Integer Linear Program, paralogy-based phylogenies can be computed exactly for up to some twenty species and their complete protein complements. Availability:The ILP formulation is implemented in the Software ParaPhylo using IBM ILOG CPLEX (TM) Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/paraphyl

Universaar

Acronym

Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

Author: Conte Matthieu G
Droc Gaetan
Gaillard Sylvain
Perin Christophe
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between <it>Oryza sativa </it>and <it>Arabidopsis thaliana</it>. Firstly, we established an efficient method to cluster <it>A. thaliana </it>and <it>O. sativa </it>full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs

Author: Eddy Sean R
Zmasek Christian M
Publication venue: BioMed Central
Publication date: 01/01/2002
Field of study

BACKGROUND: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication). The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees. RESULTS: Here we present RIO (Resampled Inference of Orthologs), a procedure for automated phylogenomics using explicit phylogenetic inference. RIO analyses are performed over bootstrap resampled phylogenetic trees to estimate the reliability of orthology assignments. We also introduce supplementary concepts that are helpful for functional inference. RIO has been implemented as Perl pipeline connecting several C and Java programs. It is available at http://www.genetics.wustl.edu/eddy/forester/. A web server is at http://www.rio.wustl.edu/. RIO was tested on the Arabidopsis thaliana and Caenorhabditis elegans proteomes. CONCLUSION: The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies. We also describe how some orthologies can be misleading for functional inference

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Phylogenetics in plant biotechnology: principles, obstacles and opportunities for the resource poor

Author: Muigai AWT
Ochieng JW
Ude GN
Publication venue: 'African Journals Online (AJOL)'
Publication date: 22/07/2010
Field of study

Phylogenetic inference has become routine for most studies of genetic variation among plant taxa. However, inferring phylogenies can be confounded by both biological and computational or statisticalcomplexities, resulting in misleading evolutionary hypotheses. This is particularly critical because the “true tree” can only truly be known in exceptional circumstances. Moreover, selecting appropriatemarker(s), characters, sample sizes and the appropriate reconstruction methods offers a challenge to most evolutionary geneticists. Textbooks are generic (and sometimes outdated), and in resource poor labs, they may altogether be inaccessible. In this review, we take the worker through the low-down on reconstructing a phylogeny, review the enigmatic biological and computational problems, and examine cases where cheaper markers and extremely small sample sizes can recover a reliable phylogeny

AJOL - African Journals Online

Analysis of Gene Order Evolution Beyond Single-Copy Genes

Author: A Bergeron
A Bergeron
A Siepel
A Xu
B Arden
B Ma
B Moret
B Vernot
C Chauve
C Zheng
C Zheng
C Zheng
C Zheng
C. Chauve
CM Zmasek
D Bader
D Bertrand
D Bertrand
D Durand
D Durand
D Fulkerson
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Soltis
E Eichler
E Lyons
F Murat
F. Murat
G Blanc
G Blin
G Bourque
G Fertin
G Glusman
G Landau
G Shi
G Tesler
G Watterson
H Gavranovic
H Gavranović
I Wapinski
J Bowers
J Cotton
J Demuth
J Gordon
J Mixtacki
J Nadeau
J Salse
J-P Doyon
K Chen
K O’Brien
K Wolfe
L Zhang
L Zhang
M Alekseyev
M Goodman
M Hahn
M Lajoie
M Lajoie
M Lynch
M Muffato
M Sanderson
M Shannon
N El-Mabrouk
O Elemento
O Eulenstein
O Tremblay-Savard
P Bonizzoni
P Gorecki
P Pevzner
Q Zhu
R Guigó
R Hoberman
R LaRue
R Page
R Page
R Page
R Tatusov
R Warren
S Angibaud
S Hannenhalli
S Pham
S Schwartz
S Yancopoulos
S Yancopoulos
T Blomme
T Uno
T Vinař
V Bafna
V Shoja
W Fitch
W Li
WJ Kent
Z Adam
Z Fu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction

Author: Adams
Altschul
Arvestad
Butler
Chen
Ciccarelli
Clark
Dehal
Doyon
Dujon
Durand
Edgar
Eisen
Felsenstein
Gao
Gascuel
Hahn
Hahn
Hasegawa
Huerta-Cepas
Kellis
Kellis
Li
Li
M. D. Rasmussen
M. Kellis
Massey
Noonan
Page
Rannala
Richards
Rokas
Ronquist
Saitou
Sanderson
Shimodaira
Wapinski
Wolfe
Zmasek
Zmasek
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Supplementary tables S1, sections 2.1–2.3, and figures S1–S11 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.National Science Foundation (U.S.) (CAREER award NSF 0644282

CiteSeerX

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

Parsimonious Inference of Hybridization in the Presence of Incomplete Lineage Sorting

Author: Arnold
Bansal
Barton
Beiko
Bloomquist
Bordewich
Cranston
Degnan
Edwards
Felsenstein
Hobolth
Hudson
Huelsenbeck
Joly
Kingman
Kubatko
Kuo
Liu
Luay Nakhleh
MacLeod
Maddison
Maddison
Mallet
Mallet
Meng
Nakhleh
Nakhleh
Nakhleh
Nakhleh
Pollard
R. Matthew Barnett
Rambaut
Rannala
Rasmussen
Rieseberg
Rokas
Swofford
Syring
Takuno
Than
Than
Than
Than
Than
Than
White
Yu
Yu
Yu
Yu
Yu
Yun Yu
Publication venue
Publication date: 01/01/2013
Field of study

Hybridization plays an important evolutionary role in several groups of organisms. A phylogenetic approach to detect hybridization entails sequencing multiple loci across the genomes of a group of species of interest, reconstructing their gene trees, and taking their differences as indicators of hybridization. However, methods that follow this approach mostly ignore population effects, such as incomplete lineage sorting (ILS). Given that hybridization occurs between closely related organisms, ILS may very well be at play and, hence, must be accounted for in the analysis framework. To address this issue, we present a parsimony criterion for reconciling gene trees within the branches of a phylogenetic network, and a local search heuristic for inferring phylogenetic networks from collections of gene-tree topologies under this criterion. This framework enables phylogenetic analyses while accounting for both hybridization and ILS. Further, we propose two techniques for incorporating information about uncertainty in gene-tree estimates. Our simulation studies demonstrate the good performance of our framework in terms of identifying the location of hybridization events, as well as estimating the proportions of genes that underwent hybridization. Also, our framework shows good performance in terms of efficiency on handling large data sets in our experiments. Further, in analyzing a yeast data set, we demonstrate issues that arise when analyzing real data sets. While a probabilistic approach was recently introduced for this problem, and while parsimonious reconciliations have accuracy issues under certain settings, our parsimony framework provides a much more computationally efficient technique for this type of analysis. Our framework now allows for genome-wide scans for hybridization, while also accounting for ILS

Crossref

PubMed Central

DSpace at Rice University