6 research outputs found
The Bourque Distances for Mutation Trees of Cancers
Mutation trees are rooted trees of arbitrary node degree in which each node is labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson - Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels. Here, we generalize the Robinson - Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. A connection between the Robinson - Foulds distance and the nearest neighbor interchange distance is also presented
The generalized Robinson-Foulds distance for phylogenetic trees
The Robinson-Foulds (RF) distance, one of the most widely used metrics for comparing phylogenetic trees, has the advantage of being intuitive, with a natural interpretation in terms of common splits, and it can be computed in linear time, but it has a very low resolution, and it may become trivial for phylogenetic trees with overlapping taxa, that is, phylogenetic trees that share some but not all of their leaf labels. In this article, we study the properties of the Generalized Robinson-Foulds (GRF) distance, a recently proposed metric for comparing any structures that can be described by multisets of multisets of labels, when applied to rooted phylogenetic trees with overlapping taxa, which are described by sets of clusters, that is, by sets of sets of labels. We show that the GRF distance has a very high resolution, it can also be computed in linear time, and it is not (uniformly) equivalent to the RF distance.This research was partially supported by the Spanish Ministry of Science, Innovation and Universitiesand the European Regional Development Fund through project PGC2018-096956-B-C43 (FEDER/MICINN/AEI), and by the Agency for Management of University and Research Grants (AGAUR) throughgrant 2017-SGR-786 (ALBCOM).Peer ReviewedPostprint (published version
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Evolutionary Genomics
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth
Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page
Abstract
Acknowledgements
Index
1. The structure of this thesis
1.1. Structure of this PhD dissertation
1.2. Publications of this PhD dissertation
1.3. Computational infrastructure and resources
1.4. Disclosure of financial support and information use
1.5. Acknowledgements
1.6. Author contributions and use of impersonal and personal pronouns
2. Biological background
2.1. The complexity of the eukaryotic genome
2.2. The problem of counting and defining “genes” in eukaryotes
2.3. The “function” concept for genes and “dark matter”
2.4. Increases of organismal complexity on Earth through multicellularity
2.5. Multicellularity is a “fitness transition” in individuality
2.6. The complexity of cell differentiation in multicellularity
3. Technical background
3.1. The Phylogenetic Comparative Method (PCM)
3.2. RNA secondary structure prediction
3.3. Some standards for genome and gene annotation
4. What is in a eukaryotic genome? GenomeContent provides a good answer
4.1. Background
4.2. Motivation: an interoperable tool for data retrieval of gene annotations
4.3. Methods
4.4. Results
4.5. Discussion
5. The evolutionary correlation between genome size and ncDNA
5.1. Background
5.2. Motivation: estimating the relationship between genome size and ncDNA
5.3. Methods
5.4. Results
5.5. Discussion
6. The relationship between non-coding DNA and Complex Multicellularity
6.1. Background
6.2. Motivation: How to define and measure complex multicellularity across eukaryotes?
6.3. Methods
6.4. Results
6.5. Discussion
7. The ceRNA motif pipeline: regulation of microRNAs by target mimics
7.1. Background
7.2. A revisited protocol for the computational analysis of Target Mimics
7.3. Motivation: a novel pipeline for ceRNA motif discovery
7.4. Methods
7.5. Results
7.6. Discussion
8. Conclusions and outlook
8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses
8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya
8.3. “Complex multicellularity” is a major evolutionary transition
8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth
9. Supplementary Data
Bibliography
Curriculum Scientiae
Selbständigkeitserklärung (declaration of authorship