63 research outputs found
A General Framework for Gene Tree Correction Based on Duplication-Loss Reconciliation
Due to the key role played by gene trees and species phylogenies in biological studies, it is essential to have as much confidence as possible on the available trees. As phylogenetic tools are error-prone, it is a common task to use a correction method for improving an initial tree. Various correction methods exist. In this paper we focus on those based on the Duplication-Loss reconciliation model. The polytomy resolution approach consists in contracting weakly supported branches and then refining the obtained non-binary tree in a way minimizing a reconciliation distance with the given species tree. On the other hand, the supertree approach takes as input a set of separated subtrees, either obtained for separared orthology groups or by removing the upper branches of an initial tree to a certain level, and amalgamating them in an optimal way preserving the topology of the initial trees. The two classes of problems have always been considered as two separate fields, based on apparently different models. In this paper we give a unifying view showing that these two classes of problems are in fact special cases of a more general problem that we call LabelGTC, whose input includes a 0-1 edge-labelled gene tree to be corrected. Considering a tree as a set of triplets, we also formulate the TripletGTC Problem whose input includes a set of gene triplets that should be preserved in the corrected tree. These two general models allow to unify, understand and compare the principles of the duplication-loss reconciliation-based tree correction approaches. We show that LabelGTC is a special case of TripletGTC. We then develop appropriate algorithms allowing to handle these two general correction problems
Reconstructing the History of Syntenies Through Super-Reconciliation
Classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is clearly not suited for genes grouped in syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation model, that extends the traditional Duplication-Loss model to the reconciliation of a set of trees, accounting for segmental duplications and losses. From a complexity point of view, we show that the associated decision problem is NP-hard. We then give an exact exponential-time algorithm for this problem, assess its time efficiency on simulated datasets, and give a proof of concept on the opioid receptor genes
Evolution through segmental duplications and losses : A Super-Reconciliation approach
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes
Genome-scale phylogenetic analysis finds extensive gene transfer among Fungi
Although the role of lateral gene transfer is well recognized in the
evolution of bacteria, it is generally assumed that it has had less influence
among eukaryotes. To explore this hypothesis we compare the dynamics of genome
evolution in two groups of organisms: Cyanobacteria and Fungi. Ancestral
genomes are inferred in both clades using two types of methods. First, Count, a
gene tree unaware method that models gene duplications, gains and losses to
explain the observed numbers of genes present in a genome. Second, ALE, a more
recent gene tree-aware method that reconciles gene trees with a species tree
using a model of gene duplication, loss, and transfer. We compare their merits
and their ability to quantify the role of transfers, and assess the impact of
taxonomic sampling on their inferences. We present what we believe is
compelling evidence that gene transfer plays a significant role in the
evolution of Fungi
A roadmap for global synthesis of the plant tree of life
Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis
Algorithmes de construction et correction d'arbres de gènes par la réconciliation
Les gènes, qui servent à encoder les fonctions biologiques des êtres vivants,
forment l'unité moléculaire de base de l'hérédité.
Afin d'expliquer la diversité des espèces que l'on peut observer aujourd'hui,
il est essentiel de comprendre comment les gènes évoluent.
Pour ce faire, on doit recréer le passé en inférant leur phylogénie,
c'est-à -dire un arbre de gènes qui représente les liens
de parenté des régions codantes des vivants.
Les méthodes classiques d'inférence phylogénétique ont été élaborées principalement pour construire des arbres d'espèces et ne se basent que sur les séquences d'ADN.
Les gènes sont toutefois riches en information, et on commence à peine à voir apparaître
des méthodes de reconstruction qui
utilisent leurs propriétés spécifiques. Notamment, l'histoire d'une famille de gènes en terme de duplications et de pertes, obtenue par la réconciliation d'un arbre de gènes avec un arbre d'espèces,
peut nous permettre de détecter des faiblesses au sein d'un arbre et de l'améliorer.
Dans cette thèse, la réconciliation est appliquée
à la construction et la correction d'arbres de gènes sous trois angles différents:
1) Nous abordons la problématique de résoudre un arbre de gènes non-binaire.
En particulier, nous présentons un algorithme en temps linéaire qui résout
une polytomie
en se basant sur la réconciliation.
2) Nous proposons une nouvelle approche de correction d'arbres de gènes par les relations d'orthologie et paralogie.
Des algorithmes en temps polynomial sont présentés pour les problèmes suivants:
corriger un arbre de gènes afin qu'il contienne un ensemble d'orthologues donné, et valider un ensemble de relations partielles d'orthologie et paralogie.
3) Nous montrons comment la réconciliation peut servir à "combiner'' plusieurs arbres de gènes.
Plus précisément, nous étudions le problème de choisir un superarbre de gènes
selon son coût de réconciliation.Genes encode the biological functions of all living organisms and are the basic molecular units of heredity.
In order to explain
the diversity of species that can be observed today,
it is essential to understand how genes evolve.
To do this, the past has to be recreated by inferring their phylogeny,
i.e. a gene tree depicting the parental relationships between
the coding regions of living beings.
Traditional phylogenetic inference methods have been developed primarily to construct species trees
and are solely based on DNA sequences.
Genes, however, are rich in information and only a few known
reconstruction methods make usage of their specific properties.
In particular, the history of a gene family in terms of duplications and losses,
obtained by the reconciliation of a gene tree with a tree species,
may allow us to detect weaknesses in a tree and improve it.
In this thesis, reconciliation is applied
to the construction and correction of gene trees from three different angles:
1) We address the problem of resolving a non-binary gene tree.
In particular, we present a linear time algorithm that solves
a polytomy based on reconciliation.
2) We propose a new gene tree correction approach based on orthology and paralogy relations.
Polynomial-time algorithms are presented for the following problems:
modify a gene tree so that it contains a given set of orthologous genes,
and validate a set of partial orthology and paralogy relations.
3) We show how reconciliation can be used to "combine'' multiple gene trees.
Specifically, we study the problem of choosing a gene supertree
based on its reconciliation cost
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Modèles et algorithmes pour la segmentation de séquences biologiques et la reconstruction de leurs histoires évolutives
L’informatique est de plus en plus utilisée pour résoudre des problèmes dans divers domaines. C’est ainsi qu’avec l’accroissement des données biologiques générées par les techniques expérimentales à haut débit, la bio-informatique intervient pour tirer profit de ces masses de données et contribuer à l’avancement des connaissances en sciences biologiques. La bio-informatique est un domaine interdisciplinaire ayant pour but d’étudier et de résoudre des problèmes computationnels issus des sciences biologiques. Un des problèmes intemporels étudié en bio-informatique est la reconstruction de l’histoire évolutive de génomes, qui sous-entend essentiellement celle des gènes. Les gènes sont le support de l’information génétique et sont les unités de base de l’hérédité. De nos jours, un grand nombre de maladies, telles les cancers, ont une base génétique. Une bonne compréhension de l’évolution des gènes permettrait de mieux comprendre les processus impliqués dans ces maladies pour mieux les traiter. De plus, les connaissances sur l’évolution de gènes sont utiles pour la prédiction et l’annotation de nouveaux gènes. Il a été montré que les gènes eucaryotes subissent un phénomène d’épissage alternatif qui permet aux gènes de produire plusieurs transcrits différents afin de se diversifier fonctionnellement. C’est dans ce contexte que se situe cette thèse de doctorat. L’objectif de la thèse est de définir des modèles et des algorithmes efficaces et précis pour la segmentation de séquences biologiques et la reconstruction de leurs histoires évolutives en tenant compte de l’épissage alternatif. Dans cette thèse, j'ai contribué à accroître les connaissances scientifiques en introduisant et en formalisant des modèles d’évolution de transcrits et de gènes. Nous avons proposé deux algorithmes pour la segmentation de transcrits alternatifs. Nous avons également proposé un outil de simulation de l’évolution des séquences biologiques et un outil de visualisation de coévolution. Pour chacun des modèles et algorithmes proposés, nous avons développé des applications pour permettre l’utilisation facile de nos outils
- …