73 research outputs found

    A New Method for Species Identification via Protein-Coding and Non-Coding DNA Barcodes by Combining Machine Learning with Bioinformatic Methods

    Get PDF
    Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75–100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62–98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60–99.37%) for 1094 brown algae queries, both using ITS barcodes

    Phylogenetic Reconstruction and DNA Barcoding for Closely Related Pine Moth Species (Dendrolimus) in China with Multiple Gene Markers

    Get PDF
    Unlike distinct species, closely related species offer a great challenge for phylogeny reconstruction and species identification with DNA barcoding due to their often overlapping genetic variation. We tested a sibling species group of pine moth pests in China with a standard cytochrome c oxidase subunit I (COI) gene and two alternative internal transcribed spacer (ITS) genes (ITS1 and ITS2). Five different phylogenetic/DNA barcoding analysis methods (Maximum likelihood (ML)/Neighbor-joining (NJ), “best close match” (BCM), Minimum distance (MD), and BP-based method (BP)), representing commonly used methodology (tree-based and non-tree based) in the field, were applied to both single-gene and multiple-gene analyses. Our results demonstrated clear reciprocal species monophyly for three relatively distant related species, Dendrolimus superans, D. houi, D. kikuchii, as recovered by both single and multiple genes while the phylogenetic relationship of three closely related species, D. punctatus, D. tabulaeformis, D. spectabilis, could not be resolved with the traditional tree-building methods. Additionally, we find the standard COI barcode outperforms two nuclear ITS genes, whatever the methods used. On average, the COI barcode achieved a success rate of 94.10–97.40%, while ITS1 and ITS2 obtained a success rate of 64.70–81.60%, indicating ITS genes are less suitable for species identification in this case. We propose the use of an overall success rate of species identification that takes both sequencing success and assignation success into account, since species identification success rates with multiple-gene barcoding system were generally overestimated, especially by tree-based methods, where only successfully sequenced DNA sequences were used to construct a phylogenetic tree. Non-tree based methods, such as MD, BCM, and BP approaches, presented advantages over tree-based methods by reporting the overall success rates with statistical significance. In addition, our results indicate that the most closely related species D. punctatus, D. tabulaeformis, and D. spectabilis, may be still in the process of incomplete lineage sorting, with occasional hybridizations occurring among them

    DNA Barcoding Bromeliaceae: Achievements and Pitfalls

    Get PDF
    <div><h3>Background</h3><p>DNA barcoding has been successfully established in animals as a tool for organismal identification and taxonomic clarification. Slower nucleotide substitution rates in plant genomes have made the selection of a DNA barcode for land plants a much more difficult task. The Plant Working Group of the Consortium for the Barcode of Life (CBOL) recommended the two-marker combination <em>rbcL</em>/<em>matK</em> as a pragmatic solution to a complex trade-off between universality, sequence quality, discrimination, and cost.</p> <h3>Methodology/Principal Findings</h3><p>It is expected that a system based on any one, or a small number of plastid genes will fail within certain taxonomic groups with low amounts of plastid variation, while performing well in others. We tested the effectiveness of the proposed CBOL Plant Working Group barcoding <em>markers</em> for land plants in identifying 46 bromeliad species, a group rich in endemic species from the endangered Brazilian Atlantic Rainforest. Although we obtained high quality sequences with the suggested primers, species discrimination in our data set was only 43.48%. Addition of a third marker, <em>trnH–psbA</em>, did not show significant improvement. This species identification failure in Bromeliaceaecould also be seen in the analysis of the GenBank's <em>matK</em> data set. Bromeliaceae's sequence divergence was almost three times lower than the observed for Asteraceae and Orchidaceae. This low variation rate also resulted in poorly resolved tree topologies. Among the three Bromeliaceae subfamilies sampled, Tillandsioideae was the only one recovered as a monophyletic group with high bootstrap value (98.6%). Species paraphyly was a common feature in our sampling.</p> <h3>Conclusions/Significance</h3><p>Our results show that although DNA barcoding is an important tool for biodiversity assessment, it tends to fail in taxonomy complicated and recently diverged plant groups, such as Bromeliaceae. Additional research might be needed to develop markers capable to discriminate species in these complex botanical groups.</p> </div

    The Application of DNA Barcodes for the Identification of Marine Crustaceans from the North Sea and Adjacent Regions

    Get PDF
    During the last years DNA barcoding has become a popular method of choice for molecular specimen identification. Here we present a comprehensive DNA barcode library of various crustacean taxa found in the North Sea, one of the most extensively studied marine regions of the world. Our data set includes 1,332 barcodes covering 205 species, including taxa of the Amphipoda, Copepoda, Decapoda, Isopoda, Thecostraca, and others. This dataset represents the most extensive DNA barcode library of the Crustacea in terms of species number to date. By using the Barcode of Life Data Systems (BOLD), unique BINs were identified for 198 (96.6%) of the analyzed species. Six species were characterized by two BINs (2.9%), and three BINs were found for the amphipod species Gammarus salinus Spooner, 1947 (0.4%). Intraspecific distances with values higher than 2.2% were revealed for 13 species (6.3%). Exceptionally high distances of up to 14.87% between two distinct but monophyletic clusters were found for the parasitic copepod Caligus elongatus Nordmann, 1832, supporting the results of previous studies that indicated the existence of an overlooked sea louse species. In contrast to these high distances, haplotype-sharing was observed for two decapod spider crab species, Macropodia parva Van Noort & Adema, 1985 and Macropodia rostrata (Linnaeus, 1761), underlining the need for a taxonomic revision of both species. Summarizing the results, our study confirms the application of DNA barcodes as highly effective identification system for the analyzed marine crustaceans of the North Sea and represents an important milestone for modern biodiversity assessment studies using barcode sequence

    Systematic and Evolutionary Insights Derived from mtDNA COI Barcode Diversity in the Decapoda (Crustacea: Malacostraca)

    Get PDF
    Background: Decapods are the most recognizable of all crustaceans and comprise a dominant group of benthic invertebrates of the continental shelf and slope, including many species of economic importance. Of the 17635 morphologically described Decapoda species, only 5.4% are represented by COI barcode region sequences. It therefore remains a challenge to compile regional databases that identify and analyse the extent and patterns of decapod diversity throughout the world. Methodology/Principal Findings: We contributed 101 decapod species from the North East Atlantic, the Gulf of Cadiz and the Mediterranean Sea, of which 81 species represent novel COI records. Within the newly-generated dataset, 3.6% of the species barcodes conflicted with the assigned morphological taxonomic identification, highlighting both the apparent taxonomic ambiguity among certain groups, and the need for an accelerated and independent taxonomic approach. Using the combined COI barcode projects from the Barcode of Life Database, we provide the most comprehensive COI data set so far examined for the Order (1572 sequences of 528 species, 213 genera, and 67 families). Patterns within families show a general predicted molecular hierarchy, but the scale of divergence at each taxonomic level appears to vary extensively between families. The range values of mean K2P distance observed were: within species 0.285% to 1.375%, within genus 6.376% to 20.924% and within family 11.392% to 25.617%. Nucleotide composition varied greatly across decapods, ranging from 30.8 % to 49.4 % GC content. Conclusions/Significance: Decapod biological diversity was quantified by identifying putative cryptic species allowing a rapid assessment of taxon diversity in groups that have until now received limited morphological and systematic examination. We highlight taxonomic groups or species with unusual nucleotide composition or evolutionary rates. Such data are relevant to strategies for conservation of existing decapod biodiversity, as well as elucidating the mechanisms and constraints shaping the patterns observed.FCT - SFRH/BD/25568/ 2006EC FP6 - GOCE-CT-2005-511234 HERMESFCT - PTDC/MAR/69892/2006 LusomarBo

    Classification of Plant Associated Bacteria Using RIF, a Computationally Derived DNA Marker

    Get PDF
    A DNA marker that distinguishes plant associated bacteria at the species level and below was derived by comparing six sequenced genomes of Xanthomonas, a genus that contains many important phytopathogens. This DNA marker comprises a portion of the dnaA replication initiation factor (RIF). Unlike the rRNA genes, dnaA is a single copy gene in the vast majority of sequenced bacterial genomes, and amplification of RIF requires genus-specific primers. In silico analysis revealed that RIF has equal or greater ability to differentiate closely related species of Xanthomonas than the widely used ribosomal intergenic spacer region (ITS). Furthermore, in a set of 263 Xanthomonas, Ralstonia and Clavibacter strains, the RIF marker was directly sequenced in both directions with a success rate approximately 16% higher than that for ITS. RIF frameworks for Xanthomonas, Ralstonia and Clavibacter were constructed using 682 reference strains representing different species, subspecies, pathovars, races, hosts and geographic regions, and contain a total of 109 different RIF sequences. RIF sequences showed subspecific groupings but did not place strains of X. campestris or X. axonopodis into currently named pathovars nor R. solanacearum strains into their respective races, confirming previous conclusions that pathovar and race designations do not necessarily reflect genetic relationships. The RIF marker also was sequenced for 24 reference strains from three genera in the Enterobacteriaceae: Pectobacterium, Pantoea and Dickeya. RIF sequences of 70 previously uncharacterized strains of Ralstonia, Clavibacter, Pectobacterium and Dickeya matched, or were similar to, those of known reference strains, illustrating the utility of the frameworks to classify bacteria below the species level and rapidly match unknown isolates to reference strains. The RIF sequence frameworks are available at the online RIF database, RIFdb, and can be queried for diagnostic purposes with RIF sequences obtained from unknown strains in both chromatogram and FASTA format

    Reexamination of the species assignment of Diacavolinia pteropods using DNA barcoding

    Get PDF
    © The Author(s), 2013. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS ONE 8 (2013): e53889, doi:10.1371/journal.pone.0053889.Thecosome pteropods (Mollusca, Gastropoda) are an ecologically important, diverse, and ubiquitous group of holoplanktonic animals that are the focus of intense research interest due to their external aragonite shell and vulnerability to ocean acidification. Characterizing the response of these animals to low pH and other environmental stressors has been hampered by continued uncertainty in their taxonomic identification. An example of this confusion in species assignment is found in the genus Diacavolinia. All members of this genus were originally indentified as a single species, Cavolinia longirostris, but over the past fifty years the taxonomy has been revisited multiple times; currently the genus comprises 22 different species. This study examines five species of Diacavolinia, including four sampled in the Northeast Atlantic (78 individuals) and one from the Eastern tropical North Pacific (15 individuals). Diacavolina were identified to species based on morphological characteristics according to the current taxonomy, photographed, and then used to determine the sequence of the “DNA barcoding” region of the cytochrome c oxidase subunit I (COI). Specimens from the Atlantic, despite distinct differences in shell morphology, showed polyphyly and a genetic divergence of <3% (K2P distance) whereas the Pacific and Atlantic samples were more distant (~19%). Comparisons of Diacavolinia spp. with other Cavolinia spp. reveal larger distances (~24%). These results indicate that specimens from the Atlantic comprise a single monophyletic species and suggest possible species-level divergence between Atlantic and Pacific populations. The findings support the maintenance of Diacavolinia as a separate genus, yet emphasize the inadequacy of our current taxonomic understanding of pteropods. They highlight the need for accurate species identifications to support estimates of biodiversity, range extent and natural exposure of these planktonic calcifiers to environmental variability; furthermore, the apparent variation of the pteropods shell may have implications for our understanding of the species’ sensitivity to ocean acidification.This material is based upon work supported by the National Science Foundation under Grant Number OCE-0928801. AEM was funded through the WHOI Postdoctoral Scholarship. Support to LBB was provided by the College of Liberal Arts & Sciences, University of Connecticut; and by the Census of Marine Life/Alfred P. Sloan Foundation

    Unprecedented within-species chromosome number cline in the Wood White butterfly Leptidea sinapis and its significance for karyotype evolution and speciation

    Get PDF
    Background: Species generally have a fixed number of chromosomes in the cell nuclei while between-species differences are common and often pronounced. These differences could have evolved through multiple speciation events, each involving the fixation of a single chromosomal rearrangement. Alternatively, marked changes in the karyotype may be the consequence of within-species accumulation of multiple chromosomal fissions/fusions, resulting in highly polymorphic systems with the subsequent extinction of intermediate karyomorphs. Although this mechanism of chromosome number evolution is possible in theory, it has not been well documented. Results: We present the discovery of exceptional intraspecific variability in the karyotype of the widespread Eurasian butterfly Leptidea sinapis. We show that within this species the diploid chromosome number gradually decreases from 2n = 106 in Spain to 2n = 56 in eastern Kazakhstan, resulting in a 6000 km-wide cline that originated recently (8,500 to 31,000 years ago). Remarkably, intrapopulational chromosome number polymorphism exists, the chromosome number range overlaps between some populations separated by hundreds of kilometers, and chromosomal heterozygotes are abundant. We demonstrate that this karyotypic variability is intraspecific because in L. sinapis a broad geographical distribution is coupled with a homogenous morphological and genetic structure. Conclusions: The discovered system represents the first clearly documented case of explosive chromosome number evolution through intraspecific and intrapopulation accumulation of multiple chromosomal changes. Leptidea sinapis may be used as a model system for studying speciation by means of chromosomally-based suppressed recombination mechanisms, as well as clinal speciation, a process that is theoretically possible but difficult to document. The discovered cline seems to represent a narrow time-window of the very first steps of species formation linked to multiple chromosomal changes that have occurred explosively. This case offers a rare opportunity to study this process before drift, dispersal, selection, extinction and speciation erase the traces of microevolutionary events and just leave the final picture of a pronounced interspecific chromosomal difference

    The second internal transcribed spacer of nuclear ribosomal DNA as a tool for Latin American anopheline taxonomy: a critical review

    Full text link
    corecore