2,122 research outputs found
A format for phylogenetic placements
We have developed a unified format for phylogenetic placements, that is,
mappings of environmental sequence data (e.g. short reads) into a phylogenetic
tree. We are motivated to do so by the growing number of tools for computing
and post-processing phylogenetic placements, and the lack of an established
standard for storing them. The format is lightweight, versatile, extensible,
and is based on the JSON format which can be parsed by most modern programming
languages. Our format is already implemented in several tools for computing and
post-processing parsimony- and likelihood-based phylogenetic placements, and
has worked well in practice. We believe that establishing a standard format for
analyzing read placements at this early stage will lead to a more efficient
development of powerful and portable post-analysis tools for the growing
applications of phylogenetic placement.Comment: Documents version 3 of the forma
Topological network alignment uncovers biological function and phylogeny
Sequence comparison and alignment has had an enormous impact on our
understanding of evolution, biology, and disease. Comparison and alignment of
biological networks will likely have a similar impact. Existing network
alignments use information external to the networks, such as sequence, because
no good algorithm for purely topological alignment has yet been devised. In
this paper, we present a novel algorithm based solely on network topology, that
can be used to align any two networks. We apply it to biological networks to
produce by far the most complete topological alignments of biological networks
to date. We demonstrate that both species phylogeny and detailed biological
function of individual proteins can be extracted from our alignments.
Topology-based alignments have the potential to provide a completely new,
independent source of phylogenetic information. Our alignment of the
protein-protein interaction networks of two very different species--yeast and
human--indicate that even distant species share a surprising amount of network
topology with each other, suggesting broad similarities in internal cellular
wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
The inference of gene trees with species trees
Molecular phylogeny has focused mainly on improving models for the
reconstruction of gene trees based on sequence alignments. Yet, most
phylogeneticists seek to reveal the history of species. Although the histories
of genes and species are tightly linked, they are seldom identical, because
genes duplicate, are lost or horizontally transferred, and because alleles can
co-exist in populations for periods that may span several speciation events.
Building models describing the relationship between gene and species trees can
thus improve the reconstruction of gene trees when a species tree is known, and
vice-versa. Several approaches have been proposed to solve the problem in one
direction or the other, but in general neither gene trees nor species trees are
known. Only a few studies have attempted to jointly infer gene trees and
species trees. In this article we review the various models that have been used
to describe the relationship between gene trees and species trees. These models
account for gene duplication and loss, transfer or incomplete lineage sorting.
Some of them consider several types of events together, but none exists
currently that considers the full repertoire of processes that generate gene
trees along the species tree. Simulations as well as empirical studies on
genomic data show that combining gene tree-species tree models with models of
sequence evolution improves gene tree reconstruction. In turn, these better
gene trees provide a better basis for studying genome evolution or
reconstructing ancestral chromosomes and ancestral gene sequences. We predict
that gene tree-species tree methods that can deal with genomic data sets will
be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational
Evolutionary Biology" conference, Montpellier, 201
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
- …