9 research outputs found

    Reconstruction of time-consistent species trees

    Get PDF
    Background The history of gene families—which are equivalent to event-labeled gene trees—can to some extent be reconstructed from empirically estimated evolutionary event-relations containing pairs of orthologous, paralogous or xenologous genes. The question then arises as whether inferred event-labeled gene trees are “biologically feasible” which is the case if one can find a species tree with which the gene tree can be reconciled in a time-consistent way. Results In this contribution, we consider event-labeled gene trees that contain speciations, duplications as well as horizontal gene transfer (HGT) and we assume that the species tree is unknown. Although many problems become NP-hard as soon as HGT and time-consistency are involved, we show, in contrast, that the problem of finding a time-consistent species tree for a given event-labeled gene can be solved in polynomial-time. We provide a cubic-time algorithm to decide whether a “time-consistent” species tree for a given event-labeled gene tree exists and, in the affirmative case, to construct the species tree within the same time-complexity

    Reconciling event-labeled gene trees with MUL-trees and species networks

    Get PDF
    Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be “folded up” to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Gene Family Histories: Theory and Algorithms

    Get PDF
    Detailed gene family histories and reconciliations with species trees are a prerequisite for studying associations between genetic and phenotypic innovations. Even though the true evolutionary scenarios are usually unknown, they impose certain constraints on the mathematical structure of data obtained from simple yes/no questions in pairwise comparisons of gene sequences. Recent advances in this field have led to the development of methods for reconstructing (aspects of) the scenarios on the basis of such relation data, which can most naturally be represented by graphs on the set of considered genes. We provide here novel characterizations of best match graphs (BMGs) which capture the notion of (reciprocal) best hits based on sequence similarities. BMGs provide the basis for the detection of orthologous genes (genes that diverged after a speciation event). There are two main sources of error in pipelines for orthology inference based on BMGs. Firstly, measurement errors in the estimation of best matches from sequence similarity in general lead to violations of the characteristic properties of BMGs. The second issue concerns the reconstruction of the orthology relation from a BMG. We show how to correct estimated BMG to mathematically valid ones and how much information about orthologs is contained in BMGs. We then discuss implicit methods for horizontal gene transfer (HGT) inference that focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of an undirected graph, the later-divergence-time (LDT) graph. We explore the mathematical structure of LDT graphs and show how much information about all HGT events is contained in such LDT graphs

    Computational approaches to discovering differentiation genes in the peripheral nervous system of drosophila melanogaster

    Get PDF
    In the common fruit fly, Drosophila melanogaster, neural cell fate specification is triggered by a group of conserved transcriptional regulators known as proneural factors. Proneural factors induce neural fate in uncommitted neuroectodermal progenitor cells, in a process that culminates in sensory neuron differentiation. While the role of proneural factors in early fate specification has been described, less is known about the transition between neural specification and neural differentiation. The aim of this thesis is to use computational methods to improve the understanding of terminal neural differentiation in the Peripheral Nervous System (PNS) of Drosophila. To provide an insight into how proneural factors coordinate the developmental programme leading to neural differentiation, expression profiling covering the first 3 hours of PNS development in Drosophila embryos had been previously carried out by Cachero et al. [2011]. The study revealed a time-course of gene expression changes from specification to differentiation and suggested a cascade model, whereby proneural factors regulate a group of intermediate transcriptional regulators which are in turn responsible for the activation of specific differentiation target genes. In this thesis, I propose to select potentially important differentiation genes from the transcriptional data in Cachero et al. [2011] using a novel approach centred on protein interaction network-driven prioritisation. This is based on the insight that biological hypotheses supported by diverse data sources can represent stronger candidates for follow-up studies. Specifically, I propose the usage of protein interaction network data because of documented transcriptome-interactome correlations, which suggest that differentially expressed genes encode products that tend to belong to functionally related protein interaction clusters. Experimental protein interaction data is, however, remarkably sparse. To increase the informative power of protein-level analyses, I develop a novel approach to augment publicly available protein interaction datasets using functional conservation between orthologous proteins across different genomes, to predict interologs (interacting orthologs). I implement this interolog retrieval methodology in a collection of open-source software modules called Bio:: Homology::InterologWalk, the first generalised framework using web-services for “on-the- fly” interolog projection. Bio::Homology::InterologWalk works with homology data for any of the hundreds of genomes in Ensembl and Ensembgenomes Metazoa, and with experimental protein interaction data curated by EBI Intact. It generates putative protein interactions and optionally collates meta-data into a prioritisation index that can be used to help select interologs with high experimental support. The methodology proposed represents a significant advance over existing interolog data sources, which are restricted to specific biological domains with fixed underlying data sources often only accessible through basic web-interfaces. Using Bio::Homology::InterologWalk, I build interolog models in Drosophila sensory neurons and, guided by the transcriptome data, find evidence implicating a small set of genes in a conserved sensory neuronal specialisation dynamic, the assembly of the ciliary dendrite in mechanosensory neurons. Using network community-finding algorithms I obtain functionally enriched communities, which I analyse using an array of novel computational techniques. The ensuing datasets lead to the elucidation of a cluster of interacting proteins encoded by the target genes of one of the intermediate transcriptional regulators of neurogenesis and ciliogenesis, fd3F. These targets are validated in vivo and result in improved knowledge of the important target genes activated by the transcriptional cascade, suggesting a scenario for the mechanisms orchestrating the ordered assembly of the cilium during differentiation

    Evolutionary Genomics

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Relative Timing of Intron Gain and a New Marker for Phylogenetic Analyses

    Get PDF
    Despite decades of effort by molecular systematists, the trees of life of eukaryotic organisms still remain partly unresolved or in conflict with each other. An ever increasing number of fully-sequenced genomes of various eukaryotes allows to consider gene and species phylogenies at genome-scale. However, such phylogenomics-based approaches also revealed that more taxa and more and more gene sequences are not the ultimate solution to fully resolve these conflicts, and that there is a need for sequence-independent phylogenetic meta-characters that are derived from genome sequences. Spliceosomal introns are characteristic features of eukaryotic nuclear genomes. The relatively rare changes of spliceosomal intron positions have already been used as genome-level markers, both for the estimation of intron evolution and phylogenies, however with variable success. In this thesis, a specific subset of these changes is introduced and established as a novel phylogenetic marker, termed near intron pair (NIP). These characters are inferred from homologous genes that contain mutually-exclusive intron presences at pairs of coding sequence (CDS) positions in close proximity. The idea that NIPs are powerful characters is based on the assumption that both very small exons and multiple intron gains at the same position are rare. To obtain sufficient numbers of NIP character data from genomic and alignment data sets in a consistent and flexible way, the implementation of a computational pipeline was a main goal of this work. Starting from orthologous (or more general: homologous) gene datasets comprising genomic sequences and corresponding CDS transcript annotations, the multiple alignment generation is an integral part of this pipeline. The alignment can be calculated at the amino acid level utilizing external tools (e.g. transAlign) and results in a codon alignment via back-translation. Guided by the multiple alignment, the positionally homologous intron positions should become apparent when mapped individually for each transcript. The pipeline proceeds at this stage to output portions of the intron-annotated alignment that contain at least one candidate of a NIP character. In a subsequent pipeline script, these collected so-called NIP region files are finally converted to binary state characters representing valid NIPs in dependence of quality filter constraints concerning, e.g., the amino acid alignment conservation around intron loci and splice sites, to name a few. The computational pipeline tools provide the researcher to elaborate on NIP character matrices that can be used for tree inference, e.g., using the maximum parsimony approach. In a first NIP-based application, the phylogenetic position of major orders of holometabolic insects (more specifically: the Coleoptera-Hymenoptera-Mecopterida trifurcation) was evaluated in a cladistic sense. As already suggested during a study on the eIF2gamma gene based on two NIP cases (Krauss et al. 2005), the genome-scale evaluation supported Hymenoptera as sister group to an assemblage of Coleoptera and Mecopterida, in agreement with other studies, but contradicting the previously established view. As part of the genome paper describing a new species of twisted-wing parasites (Strepsiptera), the NIP method was employed to help to resolve the phylogenetic position of them within (holometabolic) insects. Together with analyses of sequence patterns and a further meta-character, it revealed twisted-wing parasites as being the closest relatives of the mega-diverse beetles. NIP-based reconstructions of the metazoan tree covering a broad selection of representative animal species also identified some weaknesses of the NIP approach that may suffer e.g. from alignment/ortholog prediction artifacts (depending on the depth of range of taxa) and systematic biases (long branch attraction artifacts, due to unequal evolutionary rates of intron gain/loss and the use of the maximum parsimony method). In a further study, the identification of NIPs within the recently diverged genus Drosophila could be utilized to characterize recent intron gain events that apparently involved several cases of intron sliding and tandem exon duplication, albeit the mechanisms of gain for the majority of cases could not be elucidated. Finally, the NIP marker could be established as a novel phylogenetic marker, in particular dedicated to complementarily explore the wealth of genome data for phylogenetic purposes and to address open questions of intron evolution
    corecore