116 research outputs found
A Simple Characterization of the Minimal Obstruction Sets for Three-State Perfect Phylogenies
Lam, Gusfield, and Sridhar (2009) showed that a set of three-state characters
has a perfect phylogeny if and only if every subset of three characters has a
perfect phylogeny. They also gave a complete characterization of the sets of
three three-state characters that do not have a perfect phylogeny. However, it
is not clear from their characterization how to find a subset of three
characters that does not have a perfect phylogeny without testing all triples
of characters. In this note, we build upon their result by giving a simple
characterization of when a set of three-state characters does not have a
perfect phylogeny that can be inferred from testing all pairs of characters
Recommended from our members
Topology of Reticulate Evolution
The standard representation of evolutionary relationships is a bifurcating tree. However, many types of genetic exchange, collectively referred to as reticulate evolution, involve processes that cannot be modeled as trees. Increasing genomic data has pointed to the prevalence of reticulate processes, particularly in microorganisms, and underscored the need for new approaches to capture and represent the scale and frequency of these events.
This thesis contains results from applying new techniques from applied and computational topology, under the heading topological data analysis, to the problem of characterizing reticulate evolution in molecular sequence data. First, we develop approaches for analyzing sequence data using topology. We propose new topological constructions specific to molecular sequence data that generalize standard constructions such as Vietoris-Rips. We draw on previous work in phylogenetic networks and use homology to provide a quantitative measure of reticulate events. We develop methods for performing statistical inference using topological summary statistics.
Next, we apply our approach to several types of molecular sequence data. First, we examine the mosaic genome structure in phages. We recover inconsistencies in existing morphology-based taxonomies, use a network approach to construct a genome-based representation of phage relationships, and identify conserved gene families within phage populations. Second, we study influenza, a common human pathogen. We capture widespread patterns of reassortment, including nonrandom cosegregation of segments and barriers to subtype mixing. In contrast to traditional influenza studies, which focus on the phylogenetic branching patterns of only the two surface-marker proteins, we use whole-genome data to represent influenza molecular relationships. Using this representation, we identify unexpected relationships between divergent influenza subtypes. Finally, we examine a set of pathogenic bacteria. We use two sources of data to measure rates of reticulation in both the core genome and the mobile genome across a range of species. Network approaches are used to represent the population of S. aureus and analyze the spread of antibiotic resistance genes. The presence of antibiotic resistance genes in the human microbiome is investigated
Recommended from our members
Network and Algebraic Topology of Influenza Evolution
Evolution is a force that has molded human existence since its divergence from chimpanzees about 5.4 million years ago. In that same amount of time, an influenza virus, which replicates every six hours, would have undergone an equivalent number of generations over only a hundred years. The fast replication times of influenza, coupled with its high mutation rate, make the virus a perfect model to study real-time evolution at a mega-Darwin scale, more than a million times faster than human evolution. While recent developments in high-throughput sequencing provide an optimal opportunity to dissect their genetic evolution, a concurrent growth in computational tools is necessary to analyze the large influx of complex genomic data. In my thesis, I present novel computational methods to examine different aspects of influenza evolution.
I first focus on seasonal influenza, particularly the problems that hamper public health initiatives to combat the virus. I introduce two new approaches: 1. The q2-coefficient, a method of quantifying pathogen surveillance, and 2. FluGraph, a technique that employs network topology to track the spread of seasonal influenza around the world.
The second chapter of my thesis examines how mutations and reassortment combine to alter the course of influenza evolution towards pandemic formation. I highlight inherent deficiencies in the current phylogenetic paradigm for analyzing evolution and offer a novel methodology based on algebraic topology that comprehensively reconstructs both vertical and horizontal evolutionary events. I apply this method to viruses, with emphasis on influenza, but foresee broader application to cancer cells, bacteria, eukaryotes, and other taxa
Retrotransposon mediated genomic fluidity in the human and chimpanzee lineages
LINE-1s (Long interspersed elements or L1s) and Alus are highly successful non-long terminal repeat retrotransposons with copy numbers of ~520,000 and \u3e1 million within the human genome, respectively. They are associated with human genetic variation and genomic rearrangement. Although they are abundant throughout primate genomes, their propagation strategy remains poorly understood. The recently released human and chimpanzee draft genome sequences provide the opportunity to compare the human genome with the chimpanzee genome. Thus, we were able to assess how these elements expanded in primate genomes and how they create genomic instability during their integration into the host genome as well as subsequent post-insertion recombination between elements. To understand the expansion of Alu elements, we first analyzed the evolutionary history of the AluYb lineage which is one of most active Alu lineages in the human genome. We suggest that the evolutionary success of Alu elements is driven at least in part by “stealth driver” elements that maintain low retrotransposition activity over extended periods of time and occasionally produce short-lived hyperactive copies responsible for the formation and remarkable expansion of Alu elements within the genome. Second, we conducted a detailed characterization of chimpanzee-specific L1 subfamily diversity. Our results showed that L1 elements have experienced different evolutionary fates in humans and chimpanzees lineages. These differential evolutionary paths may be the result of random variation or the product of competition between L1 subfamily lineages. Third, we report 50 deletion events in human and chimpanzee genomes directly linked to the insertion of L1 elements, resulting in the loss of ~18 kb of human genomic sequence and ~15 kb of chimpanzee genomic sequence. This study provides the basis for developing models of the mechanisms for small and large L1 insertion-mediated deletions. Fourth, we analyzed the magnitude of Alu recombination-mediated deletions in the human lineage subsequent to the human-chimpanzee divergence. We identified 492 human-specific deletions (for a total of ~400 kb of sequence) attributable to this process. The majority of the deletions coincide with known or predicted genes, which implicates this process in creating a substantial portion of the genomic differences between humans and chimpanzees
- …