16 research outputs found

    A demonstration of unsupervised machine learning in species delimitation

    No full text
    One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics

    An approach using ddRADseq and machine learning for understanding speciation in Antarctic Antarctophilinidae gastropods

    Get PDF
    Sampling impediments and paucity of suitable material for molecular analyses have precluded the study of speciation and radiation of deep-sea species in Antarctica. We analyzed barcodes together with genome-wide single nucleotide polymorphisms obtained from double digestion restriction site-associated DNA sequencing (ddRADseq) for species in the family Antarctophilinidae. We also reevaluated the fossil record associated with this taxon to provide further insights into the origin of the group. Novel approaches to identify distinctive genetic lineages, including unsupervised machine learning variational autoencoder plots, were used to establish species hypothesis frameworks. In this sense, three undescribed species and a complex of cryptic species were identifed, suggesting allopatric speciation connected to geographic or bathymetric isolation. We further observed that the shallow waters around the Scotia Arc and on the continental shelf in the Weddell Sea present high endemism and diversity. In contrast, likely due to the glacial pressure during the Cenozoic, a deep-sea group with fewer species emerged expanding over great areas in the South-Atlantic Antarctic Ridge. Our study agrees on how diachronic paleoclimatic and current environmental factors shaped Antarctic communities both at the shallow and deep-sea levels, promoting Antarctica as the center of origin for numerous taxa such as gastropod mollusks

    An enhanced target-enrichment bait set for Hexacorallia provides phylogenomic resolution of the staghorn corals (Acroporidae) and close relatives

    No full text
    Targeted enrichment of genomic DNA can profoundly increase the phylogenetic resolution of clades and inform taxonomy. Here, we redesign a custom bait set previously developed for the cnidarian class Anthozoa to more efficiently target and capture ultraconserved elements (UCEs) and exonic loci within the subclass Hexacorallia. We test this enhanced bait set (targeting 2476 loci) on 99 specimens of scleractinian corals spanning both the "complex" (Acroporidae, Agariciidae) and "robust" (Fungiidae) clades. Focused sampling in the staghorn corals (genus Acropora) highlights the ability of sequence capture to inform the taxonomy of a Glade previously deficient in molecular resolution. A mean of 1850 ( +/- 298) loci were captured per taxon (955 UCEs, 894 exons), and a 75% complete concatenated alignment of 96 samples included 1792 loci (991 UCE, 801 exons) and similar to 1.87 million base pairs. Maximum likelihood and Bayesian analyses recovered robust molecular relationships and revealed that species-level relationships within the Acropora are incongruent with traditional morphological groupings. Both UCE and exon datasets delineated six well-supported clades within Acropora. The enhanced bait set will facilitate investigations of the evolutionary history of many important groups of reef corals, particularly where previous molecular marker development has been unsuccessful
    corecore