16 research outputs found

    GinJinn: An object‐detection pipeline for automated feature extraction from herbarium specimens

    Get PDF
    Premise The generation of morphological data in evolutionary, taxonomic, and ecological studies of plants using herbarium material has traditionally been a labor‐intensive task. Recent progress in machine learning using deep artificial neural networks (deep learning) for image classification and object detection has facilitated the establishment of a pipeline for the automatic recognition and extraction of relevant structures in images of herbarium specimens. Methods and Results We implemented an extendable pipeline based on state‐of‐the‐art deep‐learning object‐detection methods to collect leaf images from herbarium specimens of two species of the genus Leucanthemum . Using 183 specimens as the training data set, our pipeline extracted one or more intact leaves in 95% of the 61 test images. Conclusions We establish GinJinn as a deep‐learning object‐detection tool for the automatic recognition and extraction of individual leaves or other structures from herbarium specimens. Our pipeline offers greater flexibility and a lower entrance barrier than previous image‐processing approaches based on hand‐crafted features

    Long‐read genotyping with SLANG (Simple Long‐read loci Assembly of Nanopore data for Genotyping)

    Get PDF
    Premise Most phylogenomic library preparation methods and bioinformatic analysis tools in restriction site–associated DNA sequencing (RADseq)/genotyping-by-sequencing (GBS) studies are designed for use with Illumina data. The lack of alternative bioinformatic pipelines hinders the exploration of long-read multi-locus data from other sequencing platforms. The Simple Long-read loci Assembly of Nanopore data for Genotyping (SLANG) pipeline enables locus assembly, orthology estimation, and single-nucleotide polymorphism (SNP) calling using Nanopore-sequenced multi-locus data. Methods and Results Two test libraries (Leucanthemum spp., Senecio spp.; Compositae) were prepared using an amplified fragment length polymorphism (AFLP)-based method to reduce genome complexity, then Nanopore-sequenced, and analyzed with SLANG. We identified 704 and 448 orthologous loci with 12,368 and 10,048 SNPs, respectively. The constructed phylogenetic networks were identical to a GBS network produced using Leucanthemum Illumina data and were consistent with Senecio species circumscriptions based on morphology. Conclusions SLANG identifies orthologous loci and extracts SNPs from long-read multi-locus Nanopore data for phylogenetic inference, population genetics, or phylogeographical studies. Combined with an AFLP-based library preparation, SLANG provides an easily scalable, cost-effective, and affordable alternative to Illumina-based RADseq/GBS procedures

    Nano‐Strainer: A workflow for the identification of single‐copy nuclear loci for plant systematic studies, using target capture kits and Oxford Nanopore long reads

    Get PDF
    In modern plant systematics, target enrichment enables simultaneous analysis of hundreds of genes. However, when dealing with reticulate or polyploidization histories, few markers may suffice, but often are required to be single-copy, a condition that is not necessarily met with commercial capture kits. Also, large genome sizes can render target capture ineffective, so that amplicon sequencing would be preferable; however, knowledge about suitable loci is often missing. Here, we present a comprehensive workflow for the identification of putative single-copy nuclear markers in a genus of interest, by mining a small dataset from target capture using a few representative taxa. The proposed pipeline assesses sequence variability contained in the data from targeted loci and assigns reads to their respective genes, via a combined BLAST/clustering procedure. Cluster consensus sequences are then examined based on four pre-defined criteria presumably indicative for absence of paralogy. This is done by calculating four specialized indices; loci are ranked according to their performance in these indices, and top-scoring loci are considered putatively single- or low copy. The approach can be applied to any probe set. As it relies on long reads, the present contribution also provides template workflows for processing Nanopore-based target capture data. Obtained markers are further tested and then entered into amplicon sequencing. For the detection of possibly remaining paralogy in these data, which might occur in groups with rampant paralogy, we also employ the long-read assembly tool Canu. In diploid representatives of the young Compositae genus Leucanthemum, characterized by high levels of polyploidy, our approach resulted in successful amplification of 13 loci. Modifications to remove traces of paralogy were made in seven of these. A species tree from the markers correctly reproduced main relationships in the genus, however, at low resolution. The presented workflow has the potential to valuably support phylogenetic research, for example in polyploid plant groups

    The Warps and Wefts of a Polyploidy Complex: Integrative Species Delimitation of the Diploid Leucanthemum (Compositae, Anthemideae) Representatives

    Get PDF
    Species delimitation—owing to the paramount role of the species rank in evolutionary, ecological, and nature conservation studies—is an essential contribution of taxonomy to biodiversity research. In an ‘integrative taxonomy’ approach to species delimitation on the diploid level, we searched for evolutionary significant units (the warps and wefts) that gave rise to the polyploid complex of European ox-eye daisies (Leucanthemum; Compositae-Anthemideae). Species discovery and validation methods based on genetic, ecological, geographical, and morphometric datasets were applied to test the currently accepted diploid morpho-species, i.e., morphologically delimited species, in Leucanthemum. Novel approaches were taken in the analyses of RADseq data (consensus clustering), morphometrics of reconstructed leaf silhouettes from digitized herbarium specimens, and quantification of species-distribution overlaps. We show that 17 of the 20 Leucanthemum morpho-species are supported by genetic evidence. The taxonomic rank of the remaining three morpho-species was resolved by combining genealogic, ecologic, geographic, and morphologic data in the framework of von Wettstein’s morpho-geographical species concept. We herewith provide a methodological pipeline for the species delimitation in an ‘integrative taxonomy’ fashion using sources of evidence from genealogical, morphological, ecological, and geographical data in the philosophy of De Queiroz’s “Unified Species Concept”

    Automated extraction of seed morphological traits from images

    Get PDF
    The description of biological objects, such as seeds, mainly relies on manual measurements of few characteristics, and on visual classification of structures, both of which can be subjective, error prone and time-consuming. Image analysis tools offer means to address these shortcomings, but we currently lack a method capable of automatically handling seeds from different taxa with varying morphological attributes and obtaining interpretable results. Here, we provide a simple image acquisition and processing protocol and introduce Traitor, an open-source software available as a command-line interface (CLI), which automates the extraction of seed morphological traits from images. The workflow for trait extraction consists of scanning seeds against a high-contrast background, correcting image colours, and analysing images with the software. Traitor is capable of processing hundreds of images of varied taxa simultaneously with just three commands, and without a need for training, manual fine-tuning or thresholding. The software automatically detects each object in the image and extracts size measurements, traditional morphometric descriptors widely used by scientists and practitioners, standardised shape coordinates, and colorimetric measurements. The method was tested on a dataset comprising of 91,667 images of seeds from 1228 taxa. Traitor's extracted average length and width values closely matched the average manual measurements obtained from the same collection (concordance correlation coefficient of 0.98). Further, we used a large image dataset to demonstrate how Traitor's output can be used to obtain representative seed colours for taxa, determine the phylogenetic signal of seed colour, and build objective classification categories for shape with high levels of visual interpretability. Our approach increases productivity and allows for large-scale analyses that would otherwise be unfeasible. Traitor enables the acquisition of data that are readily comparable across different taxa, opening new avenues to explore functional relevance of morphological traits and to advance on new tools for seed identification

    GinJinn2: Object detection and segmentation for ecology and evolution

    Get PDF
    Collection and preparation of empirical data still represent one of the most important, but also expensive steps in ecological and evolutionary/systematic research. Modern machine learning approaches, however, have the potential to automate a variety of tasks, which until recently could only be performed manually. Unfortunately, the application of such methods by researchers outside the field is hampered by technical difficulties. Here, we present GinJinn2, a user-friendly toolbox for deep learning-based object detection and instance segmentation on image data. Besides providing a convenient command-line interface to existing software libraries, it comprises several additional tools for data handling, pre- and postprocessing, and building advanced analysis pipelines. We demonstrate the application of GinJinn2 for biological purposes using four exemplary analyses, namely the evaluation of seed mixtures, detection of insects on glue traps, segmentation of stomata and extraction of leaf silhouettes from herbarium specimens. GinJinn2, by providing a coding-free environment, will enable users with a primary background in biology to apply deep learning-based methods for object detection and segmentation in order to automate feature extraction from image data

    Picks in the Fabric of a Polyploidy Complex: Integrative Species Delimitation in the Tetraploid <i>Leucanthemum</i> Mill. (Compositae, Anthemideae) Representatives

    Get PDF
    Based on the results of a preceding species-delimitation analysis for the diploid representatives of the genus Leucanthemum (Compositae, Anthemideae), the present study aims at the elaboration of a specific and subspecific taxonomic treatment of the tetraploid members of the genus. Following an integrative taxonomic approach, species-level decisions on eight predefined morphotaxon hypotheses were based on genetic/genealogical, morphological, ecological, and geographical differentiation patterns. ddRADseq fingerprinting and SNP-based clustering revealed genetic integrity for six of the eight morphotaxa, with no clear differentiation patterns observed between the widespread L. ircutianum subsp. ircutianum and the N Spanish (Cordillera Cantábrica) L. cantabricum and the S French L. delarbrei subsp. delabrei (northern Massif Central) and L. meridionale (western Massif Central). The inclusion of differentiation patterns in morphological (leaf dissection and shape), ecological (climatological and edaphic niches), and geographical respects (pair-wise tests of sympatry vs. allopatry) together with the application of a procedural protocol for species-rank decisions (the ‘Wettstein tesseract’) led to the proposal of an acknowledgement of the eight predefined morphotaxon hypotheses as six species (two of them with two subspecies). Nomenclatural consequences following from these results are drawn and lead to the following new combinations: Leucanthemum delarbrei subsp. meridionale (Legrand) Oberpr., T.Ott & Vogt, comb. nov. and Leucanthemum ruscinonense (Jeanb. & Timb.-Lagr.) Oberpr., T.Ott & Vogt, comb. et stat. nov

    Picks in the Fabric of a Polyploidy Complex: Integrative Species Delimitation in the Tetraploid Leucanthemum Mill. (Compositae, Anthemideae) Representatives

    Get PDF
    Based on the results of a preceding species-delimitation analysis for the diploid representatives of the genus Leucanthemum (Compositae, Anthemideae), the present study aims at the elaboration of a specific and subspecific taxonomic treatment of the tetraploid members of the genus. Following an integrative taxonomic approach, species-level decisions on eight predefined morphotaxon hypotheses were based on genetic/genealogical, morphological, ecological, and geographical differentiation patterns. ddRADseq fingerprinting and SNP-based clustering revealed genetic integrity for six of the eight morphotaxa, with no clear differentiation patterns observed between the widespread L. ircutianum subsp. ircutianum and the N Spanish (Cordillera Cantábrica) L. cantabricum and the S French L. delarbrei subsp. delabrei (northern Massif Central) and L. meridionale (western Massif Central). The inclusion of differentiation patterns in morphological (leaf dissection and shape), ecological (climatological and edaphic niches), and geographical respects (pair-wise tests of sympatry vs. allopatry) together with the application of a procedural protocol for species-rank decisions (the ‘Wettstein tesseract’) led to the proposal of an acknowledgement of the eight predefined morphotaxon hypotheses as six species (two of them with two subspecies). Nomenclatural consequences following from these results are drawn and lead to the following new combinations: Leucanthemum delarbrei subsp. meridionale (Legrand) Oberpr., T.Ott & Vogt, comb. nov. and Leucanthemum ruscinonense (Jeanb. & Timb.-Lagr.) Oberpr., T.Ott & Vogt, comb. et stat. nov

    Nano‐Strainer: A workflow for the identification of single‐copy nuclear loci for plant systematic studies, using target capture kits and Oxford Nanopore long reads

    Get PDF
    In modern plant systematics, target enrichment enables simultaneous analysis of hundreds of genes. However, when dealing with reticulate or polyploidization histories, few markers may suffice, but often are required to be single-copy, a condition that is not necessarily met with commercial capture kits. Also, large genome sizes can render target capture ineffective, so that amplicon sequencing would be preferable; however, knowledge about suitable loci is often missing. Here, we present a comprehensive workflow for the identification of putative single-copy nuclear markers in a genus of interest, by mining a small dataset from target capture using a few representative taxa. The proposed pipeline assesses sequence variability contained in the data from targeted loci and assigns reads to their respective genes, via a combined BLAST/clustering procedure. Cluster consensus sequences are then examined based on four pre-defined criteria presumably indicative for absence of paralogy. This is done by calculating four specialized indices; loci are ranked according to their performance in these indices, and top-scoring loci are considered putatively single- or low copy. The approach can be applied to any probe set. As it relies on long reads, the present contribution also provides template workflows for processing Nanopore-based target capture data. Obtained markers are further tested and then entered into amplicon sequencing. For the detection of possibly remaining paralogy in these data, which might occur in groups with rampant paralogy, we also employ the long-read assembly tool Canu. In diploid representatives of the young Compositae genus Leucanthemum, characterized by high levels of polyploidy, our approach resulted in successful amplification of 13 loci. Modifications to remove traces of paralogy were made in seven of these. A species tree from the markers correctly reproduced main relationships in the genus, however, at low resolution. The presented workflow has the potential to valuably support phylogenetic research, for example in polyploid plant groups

    Automated extraction of seed morphological traits from images

    No full text
    Abstract The description of biological objects, such as seeds, mainly relies on manual measurements of few characteristics, and on visual classification of structures, both of which can be subjective, error prone and time‐consuming. Image analysis tools offer means to address these shortcomings, but we currently lack a method capable of automatically handling seeds from different taxa with varying morphological attributes and obtaining interpretable results. Here, we provide a simple image acquisition and processing protocol and introduce Traitor, an open‐source software available as a command‐line interface (CLI), which automates the extraction of seed morphological traits from images. The workflow for trait extraction consists of scanning seeds against a high‐contrast background, correcting image colours, and analysing images with the software. Traitor is capable of processing hundreds of images of varied taxa simultaneously with just three commands, and without a need for training, manual fine‐tuning or thresholding. The software automatically detects each object in the image and extracts size measurements, traditional morphometric descriptors widely used by scientists and practitioners, standardised shape coordinates, and colorimetric measurements. The method was tested on a dataset comprising of 91,667 images of seeds from 1228 taxa. Traitor's extracted average length and width values closely matched the average manual measurements obtained from the same collection (concordance correlation coefficient of 0.98). Further, we used a large image dataset to demonstrate how Traitor's output can be used to obtain representative seed colours for taxa, determine the phylogenetic signal of seed colour, and build objective classification categories for shape with high levels of visual interpretability. Our approach increases productivity and allows for large‐scale analyses that would otherwise be unfeasible. Traitor enables the acquisition of data that are readily comparable across different taxa, opening new avenues to explore functional relevance of morphological traits and to advance on new tools for seed identification
    corecore