952 research outputs found

    MINING SARS-COV-2 PHYLOGENETIC TREES TO ESTIMATE CIRCULATING INFECTIONS AND PATTERNS OF MIGRATION

    Get PDF
    The SARS-CoV-2 pandemic led to the formation of very large databases of genomic viral data. These databases contain information on transmission dynamics, emergence and evolution of SARS-CoV-2. However, extracting this information from sequences is difficult, as most methods of analyzing viral genomes were developed for smaller data sets. Therefore, my objective was to develop new fast estimators of the number of infections (I) and the rate of migration based on simple features of SARS-CoV-2 phylogenies. I simulated pathogen evolution using a susceptible-exposed-infectious-recovered (SEIR) model of pathogen spread, reconstructing evolution using CoVizu. For simulations of I, I varied the total number of infections when a final sample was obtained. For simulations of migration rates, I simulated independent groups of infections and varied the rates of movement between these groups. I then extracted summary statistics from the simulation output and developed general linear models (GLMs) and Markov models to predict I and migration rates respectfully. I evaluated the models using validation data and veritable SARS-CoV-2 data. The GLMs formulated to predict I showed significant promise, especially when predicting when there were less than 1 million infections. The Markov models developed to predict migration rates were less successful. However, the simulation pipeline formulated to test the Markov models may be used for further development of efficient methods to estimate migration rates. This research will help inform public health officials on SARS-CoV-2 spread between countries and emerging variants that may become variants of concern. Additionally, the algorithms are flexible and, with new training, may be applied to future outbreaks of novel viral pathogens

    Simultaneous Reconstruction of Duplication Episodes and Gene-Species Mappings

    Get PDF
    We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of gene trees with missing labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. We then demonstrate how to use DP to design an algorithm that solves MetaEC. Although the algorithm is exponential in the worst case, we introduce a heuristic modification of the algorithm that provides a solution with the knowledge that it is exact. To evaluate our method, we perform two computational experiments on simulated and empirical data containing whole genome duplication events, showing that our algorithm is able to accurately infer the corresponding events

    Probability Metrics for Tropical Spaces of Different Dimensions

    Full text link
    The problem of comparing probability distributions is at the heart of many tasks in statistics and machine learning and the most classical comparison methods assume that the distributions occur in spaces of the same dimension. Recently, a new geometric solution has been proposed to address this problem when the measures live in Euclidean spaces of differing dimensions. Here, we study the same problem of comparing probability distributions of different dimensions in the tropical geometric setting, which is becoming increasingly relevant in computations and applications involving complex, geometric data structures. Specifically, we construct a Wasserstein distance between measures on different tropical projective tori - the focal metric spaces in both theory and applications of tropical geometry - via tropical mappings between probability measures. We prove equivalence of the directionality of the maps, whether starting from the lower dimensional space and mapping to the higher dimensional space or vice versa. As an important practical implication, our work provides a framework for comparing probability distributions on the spaces of phylogenetic trees with different leaf sets.Comment: 15 page

    Leveraging Constraints Plus Dynamic Programming for the Large Dollo Parsimony Problem

    Get PDF
    The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for classic parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem in polynomial time for the Dollo criterion score. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. Thus, we implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility in the context of species phylogenetics using both simulated and real data sets. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony

    Multi-trophic Interactions and Long-term Volunteer Collected Data: Networks of plant-caterpillar-parasitoid interactions across time, space, and a changing climate

    Get PDF
    The preservation of ecological complexity is an important goal for ecologists as communities respond to global change. Inherent to these efforts is the quantification and evaluation of the multiple dimensions of biodiversity, including well studied metrics of taxonomic, phylogenetic, and functional diversity. Studies on multi-trophic systems have primarily focused on taxonomic diversity, yet recent efforts have highlighted the importance of examining an underutilized biodiversity metric: interaction diversity, or the richness and abundance of the unique links connecting species. My dissertation research contributes to understanding spatial and temporal variation in the diversity of plant-caterpillar-parasitoid interactions. A central theme of my dissertation research is the use of long-term citizen science data from sites across the Americas to understand how interaction diversity changes across latitudinal, climate, disturbance, and seasonal gradients. My research in tropical forests documented the impacts of climate change. I found increases in extreme precipitation events caused reductions in interaction and species diversity with associated losses in an important ecosystem function: Biological control of herbivores by their natural enemies. In a temperate fire-adapted forest, I provided evidence for the scale-dependent nature of interaction diversity and its implications for how diversity is maintained in frequently disturbed systems. To understand spatial and temporal variation in interactions, I evaluated patterns in the beta-diversity of interactions and its components. Using this methodology, I found evidence of latitudinal patterns in the turnover of interactions, providing support that interactions are more variable in tropical than temperate regions. In the Brazilian Cerrado and Yucatan Peninsula, Mexico, I found seasonal variation in interaction diversity is primarily a consequence of seasonally constant species rewiring their interactions rather than seasonal differences in species composition. Finally, an important goal for ecology is to develop effective methods that increase the public's awareness and action toward biodiversity conservation. I fielded over 300 citizen scientists on research expeditions that contribute to the collection and rearing of these long-term data and administered surveys to understand the impact of different team models. Based on these surveys, multiple team models are effective for achieving diverse objectives and corporate teams are particularly valuable for sustainability partnerships. Together, this body of research provides evidence that interaction diversity uniquely contributes to broad patterns of biodiversity and ecosystem structure. Further, novel partnerships with various citizen science team models are an effective and efficient method to engage a diverse public audience interested in the preservation of biodiversity

    Chromosome rearrangements and population genomics

    Get PDF
    Chromosome rearrangements result in changes to the physical linkage and order of sequences in the genome. Although we have known about these mutations for more than a century, we still lack a detailed understanding of how they become fixed and what their effect is on other evolutionary processes. Analysing genome sequences provides a way to address this knowledge gap. In this thesis I compare genome assemblies and use population genomic inference to gain a better understanding of the role that chromosome rearrangements play in evolution. I focus on butterflies in the genus Brenthis, where chromosome numbers are known to vary between species. In chapter 2, I present a genome assembly of Brenthis ino and show that its genome has been shaped by many chromosome rearrangements, including a Z-autosome fusion that is still segregating. In chapter 3, I investigate how synteny information in genome sequences can be used to infer ancestral linkage groups and inter-chromosomal rearrangements, implementing the methods in a command-line tool. In chapter 4, I test whether chromosome fissions and fusions have acted as barriers to gene flow between B. ino and its sister species B. daphne. I find that chromosomes involved in rearrangements have experienced less post-divergence gene flow than the rest of the genome, suggesting that rearrangements have promoted speciation. Finally, in chapter 5, I investigate how chromosome rearrangements have become fixed in B. ino, B. daphne, and a third species, B. hecate. I show that genetic drift is unlikely to be a strong enough force to have fixed very underdominant rearrangements, and that there is only weak evidence that chromosome fusions have become fixed through positive natural selection. In summary, this work provides methods for researching chromosome evolution as well as new results about how rearrangements evolve and impact the speciation process

    Alternating Minimization for Regression with Tropical Rational Functions

    Full text link
    We propose an alternating minimization heuristic for regression over the space of tropical rational functions with fixed exponents. The method alternates between fitting the numerator and denominator terms via tropical polynomial regression, which is known to admit a closed form solution. We demonstrate the behavior of the alternating minimization method experimentally. Experiments demonstrate that the heuristic provides a reasonable approximation of the input data. Our work is motivated by applications to ReLU neural networks, a popular class of network architectures in the machine learning community which are closely related to tropical rational functions

    Genomic analysis of two phlebotomine sand fly vectors of Leishmania from the New and Old World.

    Get PDF
    Phlebotomine sand flies are of global significance as important vectors of human disease, transmitting bacterial, viral, and protozoan pathogens, including the kinetoplastid parasites of the genus Leishmania, the causative agents of devastating diseases collectively termed leishmaniasis. More than 40 pathogenic Leishmania species are transmitted to humans by approximately 35 sand fly species in 98 countries with hundreds of millions of people at risk around the world. No approved efficacious vaccine exists for leishmaniasis and available therapeutic drugs are either toxic and/or expensive, or the parasites are becoming resistant to the more recently developed drugs. Therefore, sand fly and/or reservoir control are currently the most effective strategies to break transmission. To better understand the biology of sand flies, including the mechanisms involved in their vectorial capacity, insecticide resistance, and population structures we sequenced the genomes of two geographically widespread and important sand fly vector species: Phlebotomus papatasi, a vector of Leishmania parasites that cause cutaneous leishmaniasis, (distributed in Europe, the Middle East and North Africa) and Lutzomyia longipalpis, a vector of Leishmania parasites that cause visceral leishmaniasis (distributed across Central and South America). We categorized and curated genes involved in processes important to their roles as disease vectors, including chemosensation, blood feeding, circadian rhythm, immunity, and detoxification, as well as mobile genetic elements. We also defined gene orthology and observed micro-synteny among the genomes. Finally, we present the genetic diversity and population structure of these species in their respective geographical areas. These genomes will be a foundation on which to base future efforts to prevent vector-borne transmission of Leishmania parasites

    ACARORUM CATALOGUS IX. Acariformes, Acaridida, Schizoglyphoidea (Schizoglyphidae), Histiostomatoidea (Histiostomatidae, Guanolichidae), Canestrinioidea (Canestriniidae, Chetochelacaridae, Lophonotacaridae, Heterocoptidae), Hemisarcoptoidea (Chaetodactylidae, Hyadesiidae, Algophagidae, Hemisarcoptidae, Carpoglyphidae, Winterschmidtiidae)

    Get PDF
    The 9th volume of the series Acarorum Catalogus contains lists of mites of 13 families, 225 genera and 1268 species of the superfamilies Schizoglyphoidea, Histiostomatoidea, Canestrinioidea and Hemisarcoptoidea. Most of these mites live on insects or other animals (as parasites, phoretic or commensals), some inhabit rotten plant material, dung or fungi. Mites of the families Chetochelacaridae and Lophonotacaridae are specialised to live with Myriapods (Diplopoda). The peculiar aquatic or intertidal mites of the families Hyadesidae and Algophagidae are also included.Publishe

    Artisanal food productions of animal origin: exploring food safety in the age of Whole Genome Sequencing

    Get PDF
    The artisanal food chain is enriched by a wide diversity of local food productions with delightful organoleptic characteristics and valuable nutritional properties. Despite their increasing worldwide popularity and appeal, several food safety challenges are addressed in artisanal facilities context suffering from less standardized processing conditions. In such scenario, recent advances in molecular typing and genomic surveillance (e.g., Whole Genome Sequencing [WGS]) represent an unprecedent solution capable of inferring sources of contamination as well as contributing to food safety along the artisanal food continuum. The overall objective of this PhD thesis was to explore potential microbial hazards among different artisanal food productions of animal origins (dairy and meat-derived) typical of the food culture and heritage landscape belonging to Mediterranean countries. Three different studies were then carried out, specifically focussing on: 1) compare the seasonal variability of microbiological quality and potential occurrence of microbial hazards in two batches of Italian artisanal fermented dairy and meat productions; 2) Investigate genetic relationships as well as virulome and resistome of foodborne pathogens isolated within dairy and meat-derived productions located in Italy, Spain, Portugal and Morocco; 3) investigate the population structure, virulome, resistome and mobilome of Klebsiella spp. isolates collected from study 1, including an extended range of public sequences
    corecore