30 research outputs found

    A Natural Language Processing Pipeline to extract phenotypic data from formal taxonomic descriptions with a focus on flagellate plants

    Get PDF
    Assembling large-scale phenotypic datasets for evolutionary and biodiversity studies of plants can be extremely difficult and time consuming. New semi-automated Natural Language Processing (NLP) pipelines can extract phenotypic data from taxonomic descriptions, and their performance can be enhanced by incorporating information from ontologies, like the Plant Ontology (PO) and the Plant Trait Ontology (TO). These ontologies are powerful tools for comparing phenotypes across taxa for large-scale evolutionary and ecological analyses, but they are largely focused on terms associated with flowering plants. We describe a bottom-up approach to identify terms from flagellate plants (including bryophytes, lycophytes, ferns, and gymnosperms) that can be added to existing plant ontologies. We first parsed a large corpus of electronic taxonomic descriptions using the Explorer of Taxon Concepts tool (http://taxonconceptexplorer.org/) and identified flagellate plant specific terms that were missing from the existing ontologies. We extracted new structure and trait terms, and we are currently incorporating the missing structure terms to the PO and modifying the definitions of existing terms to expand their coverage to flagellate plants. We will incorporate trait terms to the TO in the near future

    The Plant Ontology facilitates comparisons of plant development stages across species

    Get PDF
    The Plant Ontology (PO) is a community resource consisting of standardized terms, definitions, and logical relations describing plant structures and development stages, augmented by a large database of annotations from genomic and phenomic studies. This paper describes the structure of the ontology and the design principles we used in constructing PO terms for plant development stages. It also provides details of the methodology and rationale behind our revision and expansion of the PO to cover development stages for all plants, particularly the land plants (bryophytes through angiosperms). As a case study to illustrate the general approach, we examine variation in gene expression across embryo development stages in Arabidopsis and maize, demonstrating how the PO can be used to compare patterns of expression across stages and in developmentally different species. Although many genes appear to be active throughout embryo development, we identified a small set of uniquely expressed genes for each stage of embryo development and also between the two species. Evaluating the different sets of genes expressed during embryo development in Arabidopsis or maize may inform future studies of the divergent developmental pathways observed in monocotyledonous versus dicotyledonous species. The PO and its annotation databasemake plant data for any species more discoverable and accessible through common formats, thus providing support for applications in plant pathology, image analysis, and comparative development and evolution

    Insights on Reticulate Evolution in Ferns, with Special Emphasis on the Genus Ceratopteris

    Get PDF
    The history of life is often viewed as a evenly branching tree; however, in reality it is more like a tangled hedgerow. Many groups of organisms are known to have such a net-like or reticulate evolutionary history, but it is particularly common in ferns and lycophytes (also known as pteridophytes). This dissertation investigates how net-like evolution affects different groups of ferns, with a special emphasis on the model species C-fern (Ceratopteris richardii, also called the antler or water sprite fern). Genomic data are utilized to under-stand hybridization, cryptic species and reticulate evolution in two groups of ferns. The C-fern is shown to be a potential hybrid species, which has important implications for future research using this model organism

    Shell morphological diversification patterns and molecular systematics of the testate amoebae orders Arcellinida and Euglyphida

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología. Fecha de Lectura: 09-03-2023Para inferir los patrones generales que rigen la biodiversidad es necesario tener una buena representación de los taxones que la componen, y esto incluye también a los organismos más pequeños. Si bien se puede argumentar que el conocimiento de ciertos grupos de plantas y animales puede ser insuficiente, existe un claro vacío de conocimiento en los protistas, especialmente en el suelo y agua dulce. Para resolver esta “laguna” de conocimiento, esta tesis propone centrarse en un grupo particular de protistas que viven principalmente en ecosistemas continentales, las amebas tecadas. Pero para ello, es necesario resolver algunas faltas de conocimiento y desarrollar protocolos específicos para el estudio rápido y eficiente de la biodiversidad en estos taxones. La ausencia de tales protocolos limita enormemente su estudio, así como sus potenciales aplicaciones. Las amebas tecadas son un grupo parafilético de protistas ameboides que tienen en común un “caparazón” o teca autoconstruida. Estos organismos constituyen órdenes dentro de "supergrupos" eucariotas muy poco relacionados; Arcellinida en Amebozoa, Euglyphida y Thecofilosea en Rhizaria y Amphitremida en Stramenopiles (=Heterokonta). Dentro de cada grupo, estos organismos difieren en la composición y forma de las tecas, que constituyen la base de su taxonomía y sistemática. Intuitivamente, los investigadores han clasificado a los organismos asumiendo que morfologías de la teca similares deberían compartir un ancestro común. Esta suposición se basa en la hipótesis de que las tecas están sometidas a una selección neutral, y descarta la posibilidad de convergencias evolutivas entre especies o clados. Sin embargo, el “barcoding” molecular ha desafiado la sistemática y la taxonomía clásicas basadas en la morfología, mostrando patrones de diversificación morfológica de las tecas mucho más complejos y enmarañadas de lo que se pensaba. Estos resultados subrayan la necesidad de aplicar un enfoque molecular para caracterizar los taxones y establecer las relaciones entre ellos. Sin embargo, por el momento, casi todos los datos moleculares disponibles son de un único infraorden dentro de Arcellinida, los Hyalospheniformes. En Euglyphida, sólo el infraorden Euglyphina ha sido (relativamente) bien muestreado molecularmente. El primer objetivo de esta tesis es aumentar la base de datos molecular de las amebas tecadas, centrándose en Arcellinida y Euglyphida, recuperando las regiones genéticas 18S rRNA, COI y NADH. Dentro de estos genes que se han utilizado, el gen nuclear 18S rRNA fue el más conservado. También ha sido el más útil para la reconstrucción de relaciones más profundas, aunque demasiado conservado para discriminar entre especies. Por este motivo, nos centramos en el gen mitocondrial COI, de rápida evolución, que a su vez permite una buena resolución a nivel de especie. Siguiendo los principios de la taxonomía integrativa, también obtuvimos (además de las secuencias moleculares) datos sobre su localización, ecología y morfología de la teca. Esta tesis incluye los primeros datos moleculares para amebas tecadas de la Península Ibérica, tanto en ambientes de agua dulce, suelos, como de sedimentos marinos. También incluyen los primeros datos moleculares para géneros como Plagiopyxis o Trigonopyxis . Estas bases de datos servirán de antecedente para futuros estudios, y serán fundamentales para responder a dos preguntas que estructuran esta tesis: 1) "¿Cómo evoluciona la morfología de la teca en las amebas tecadas?": Entender los patrones de diversificación en las amebas tecadas es esencial para aclarar su taxonomía y sistemática, así como la aplicación de sus rasgos funcionales en los análisis ecológicos. Aquí nos centramos en la familia Cyphoderiidae (Euglyphida), Arcellidae (Arcellinida) y otros taxones de Arcellinida. Evaluamos las relaciones filogenéticas entre los taxones basándonos en datos moleculares y “mapeamos” las morfologías de las tecas y la ecología de los organismos en los árboles filogenéticos. Nuestros resultados muestran correlaciones entre ambientes y morfotipos, aportando varios casos de patrones convergentes. Esto sugiere que algunos rasgos de la teca pueden estar bajo selección positiva. 2) "¿Cómo generar datos moleculares de forma rápida y eficiente en Arcellinida?": La obtención de datos moleculares en amebas tecadas siempre ha sido un problema importante, debido a las dificultades de trabajar con estos organismos (en su mayoría) no cultivables. En consecuencia, la obtención de datos moleculares sobre las amebas tecadas es costosa en términos de tiempo y dinero, lo que explica en gran medida que sigan siendo relativamente poco estudiadas en comparación con otros grupos de protistas. Para resolver este problema, diseñamos un protocolo específico para obtener datos de ADN ambiental de Arcellinida, basado en los datos disponibles. Con este protocolo molecular específico de Arcellinida, se espera obtener cientos de secuencias ambientales mediante técnicas de “secuenciación de alto rendimiento”. Esto permitirá realizar experimentos ecológicos y biogeográficos de gran tamaño, así como estudios de bioindicación, todo lo cual requiere cantidades considerables de datos que eran imposibles de obtener en el pasado. Esta tesis aporta una nueva perspectiva integral de la historia evolutiva y la diversificación morfológica de las tecas de los órdenes Arcellinida y Euglyphida existentes; destacando la importancia de incorporar a los protistas, como las amebas tecadas, a la hora de sacar conclusiones generales que se apliquen a los eucariotas o a la biodiversidad en genera

    Differential evolution of non-coding DNA across eukaryotes and its close relationship with complex multicellularity on Earth

    Get PDF
    Here, I elaborate on the hypothesis that complex multicellularity (CM, sensu Knoll) is a major evolutionary transition (sensu Szathmary), which has convergently evolved a few times in Eukarya only: within red and brown algae, plants, animals, and fungi. Paradoxically, CM seems to correlate with the expansion of non-coding DNA (ncDNA) in the genome rather than with genome size or the total number of genes. Thus, I investigated the correlation between genome and organismal complexities across 461 eukaryotes under a phylogenetically controlled framework. To that end, I introduce the first formal definitions and criteria to distinguish ‘unicellularity’, ‘simple’ (SM) and ‘complex’ multicellularity. Rather than using the limited available estimations of unique cell types, the 461 species were classified according to our criteria by reviewing their life cycle and body plan development from literature. Then, I investigated the evolutionary association between genome size and 35 genome-wide features (introns and exons from protein-coding genes, repeats and intergenic regions) describing the coding and ncDNA complexities of the 461 genomes. To that end, I developed ‘GenomeContent’, a program that systematically retrieves massive multidimensional datasets from gene annotations and calculates over 100 genome-wide statistics. R-scripts coupled to parallel computing were created to calculate >260,000 phylogenetic controlled pairwise correlations. As previously reported, both repetitive and non-repetitive DNA are found to be scaling strongly and positively with genome size across most eukaryotic lineages. Contrasting previous studies, I demonstrate that changes in the length and repeat composition of introns are only weakly or moderately associated with changes in genome size at the global phylogenetic scale, while changes in intron abundance (within and across genes) are either not or only very weakly associated with changes in genome size. Our evolutionary correlations are robust to: different phylogenetic regression methods, uncertainties in the tree of eukaryotes, variations in genome size estimates, and randomly reduced datasets. Then, I investigated the correlation between the 35 genome-wide features and the cellular complexity of the 461 eukaryotes with phylogenetic Principal Component Analyses. Our results endorse a genetic distinction between SM and CM in Archaeplastida and Metazoa, but not so clearly in Fungi. Remarkably, complex multicellular organisms and their closest ancestral relatives are characterized by high intron-richness, regardless of genome size. Finally, I argue why and how a vast expansion of non-coding RNA (ncRNA) regulators rather than of novel protein regulators can promote the emergence of CM in Eukarya. As a proof of concept, I co-developed a novel ‘ceRNA-motif pipeline’ for the prediction of “competing endogenous” ncRNAs (ceRNAs) that regulate microRNAs in plants. We identified three candidate ceRNAs motifs: MIM166, MIM171 and MIM159/319, which were found to be conserved across land plants and be potentially involved in diverse developmental processes and stress responses. Collectively, the findings of this dissertation support our hypothesis that CM on Earth is a major evolutionary transition promoted by the expansion of two major ncDNA classes, introns and regulatory ncRNAs, which might have boosted the irreversible commitment of cell types in certain lineages by canalizing the timing and kinetics of the eukaryotic transcriptome.:Cover page Abstract Acknowledgements Index 1. The structure of this thesis 1.1. Structure of this PhD dissertation 1.2. Publications of this PhD dissertation 1.3. Computational infrastructure and resources 1.4. Disclosure of financial support and information use 1.5. Acknowledgements 1.6. Author contributions and use of impersonal and personal pronouns 2. Biological background 2.1. The complexity of the eukaryotic genome 2.2. The problem of counting and defining “genes” in eukaryotes 2.3. The “function” concept for genes and “dark matter” 2.4. Increases of organismal complexity on Earth through multicellularity 2.5. Multicellularity is a “fitness transition” in individuality 2.6. The complexity of cell differentiation in multicellularity 3. Technical background 3.1. The Phylogenetic Comparative Method (PCM) 3.2. RNA secondary structure prediction 3.3. Some standards for genome and gene annotation 4. What is in a eukaryotic genome? GenomeContent provides a good answer 4.1. Background 4.2. Motivation: an interoperable tool for data retrieval of gene annotations 4.3. Methods 4.4. Results 4.5. Discussion 5. The evolutionary correlation between genome size and ncDNA 5.1. Background 5.2. Motivation: estimating the relationship between genome size and ncDNA 5.3. Methods 5.4. Results 5.5. Discussion 6. The relationship between non-coding DNA and Complex Multicellularity 6.1. Background 6.2. Motivation: How to define and measure complex multicellularity across eukaryotes? 6.3. Methods 6.4. Results 6.5. Discussion 7. The ceRNA motif pipeline: regulation of microRNAs by target mimics 7.1. Background 7.2. A revisited protocol for the computational analysis of Target Mimics 7.3. Motivation: a novel pipeline for ceRNA motif discovery 7.4. Methods 7.5. Results 7.6. Discussion 8. Conclusions and outlook 8.1. Contributions and lessons for the bioinformatics of large-scale comparative analyses 8.2. Intron features are evolutionarily decoupled among themselves and from genome size throughout Eukarya 8.3. “Complex multicellularity” is a major evolutionary transition 8.4. Role of RNA throughout the evolution of life and complex multicellularity on Earth 9. Supplementary Data Bibliography Curriculum Scientiae Selbständigkeitserklärung (declaration of authorship

    ACARORUM CATALOGUS IX. Acariformes, Acaridida, Schizoglyphoidea (Schizoglyphidae), Histiostomatoidea (Histiostomatidae, Guanolichidae), Canestrinioidea (Canestriniidae, Chetochelacaridae, Lophonotacaridae, Heterocoptidae), Hemisarcoptoidea (Chaetodactylidae, Hyadesiidae, Algophagidae, Hemisarcoptidae, Carpoglyphidae, Winterschmidtiidae)

    Get PDF
    The 9th volume of the series Acarorum Catalogus contains lists of mites of 13 families, 225 genera and 1268 species of the superfamilies Schizoglyphoidea, Histiostomatoidea, Canestrinioidea and Hemisarcoptoidea. Most of these mites live on insects or other animals (as parasites, phoretic or commensals), some inhabit rotten plant material, dung or fungi. Mites of the families Chetochelacaridae and Lophonotacaridae are specialised to live with Myriapods (Diplopoda). The peculiar aquatic or intertidal mites of the families Hyadesidae and Algophagidae are also included.Publishe

    Summer Research Fellowship Project Descriptions 2022

    Get PDF
    A summary of research done by Smith College’s 2021 Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. Supervised by faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1012/thumbnail.jp

    Expanding the omics repertoire for model studies on a Chlorella-infecting giant virus

    Get PDF
    Viruses are the most abundant biological entities in aquatic ecosystems. As top-down controls of plankton abundance and diversity, they are intrinsically linked to biogeochemical cycling, and by proxy, to global climate change. It is thus of great interest for researchers to understand the mechanics of viral infection and persistence among ecologically important phytoplankton assemblages. Viruses which infect eukaryotic algae are observed with diverse nucleic acid types, structures, and sizes, though most isolates to date bear large, dsDNA genomes comprised of genes normally only seen in cellular organisms. The Chlorella viruses are the model system for studying these entities, with many of the ‘omics’ approaches having been used to characterize the biology of this system. Here, we present data generated from epigenomic (i.e. DNA methylation) and metabolomic experiments of the prototype Chlorella virus, PBCV-1. In order to ask questions about virus DNA methylation, we first established a novel protocol for cryopreservation of PBCV-1 to control against epigenomic and genetic drift. This allowed for a baseline characterization of the DNA methylome profile in the prototype chlorovirus, PBCV-1, using PacBio’s single-molecule, real-time (SMRT) sequencing software. The results of this study suggest the possibility of widespread epigenomic modifications, and that DNA methylation by viral restriction-modification associated enzymes is incomplete. Most instances of missing methylation marks are represented as hemimethylated palindromes, which are protected against the types of restriction enzymes encoded by these viruses and thus might represent an epigenomic regulatory function in the virus. Finally, we conducted a non-targeted metabolomics study of PBCV-1 infected Chlorella cells to make some of the first inferences of how viral infection alters the metabolic profile of this host system. Altogether, this work helps to distinguish the baseline epigenomic and metabolomic profiles of the Chlorella-PBCV-1 virus system for future comparison with more ecologically informative treatments (i.e. competition, sub-optimal light, nutrient limitation, etc.). This work will help to uncover general trends specific to algal-giant virus interactions that distinguish themselves from phage-bacteria systems
    corecore