689 research outputs found
Wolbachia and DNA barcoding insects: patterns, potential and problems
Wolbachia is a genus of bacterial endosymbionts that impacts the breeding systems of their hosts. Wolbachia can confuse the patterns of mitochondrial variation, including DNA barcodes, because it influences the pathways through which mitochondria are inherited. We examined the extent to which these endosymbionts are detected in routine DNA barcoding, assessed their impact upon the insect sequence divergence and identification accuracy, and considered the variation present in Wolbachia COI. Using both standard PCR assays (Wolbachia surface coding protein – wsp), and bacterial COI fragments we found evidence of Wolbachia in insect total genomic extracts created for DNA barcoding library construction. When >2 million insect COI trace files were examined on the Barcode of Life Datasystem (BOLD) Wolbachia COI was present in 0.16% of the cases. It is possible to generate Wolbachia COI using standard insect primers; however, that amplicon was never confused with the COI of the host. Wolbachia alleles recovered were predominantly Supergroup A and were broadly distributed geographically and phylogenetically. We conclude that the presence of the Wolbachia DNA in total genomic extracts made from insects is unlikely to compromise the accuracy of the DNA barcode library; in fact, the ability to query this DNA library (the database and the extracts) for endosymbionts is one of the ancillary benefits of such a large scale endeavor – for which we provide several examples. It is our conclusion that regular assays for Wolbachia presence and type can, and should, be adopted by large scale insect barcoding initiatives. While COI is one of the five multi-locus sequence typing (MLST) genes used for categorizing Wolbachia, there is limited overlap with the eukaryotic DNA barcode region
A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset
In an effort to catalog insect biodiversity, we propose a new large dataset
of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is
taxonomically classified by an expert, and also has associated genetic
information including raw nucleotide barcode sequences and assigned barcode
index numbers, which are genetically-based proxies for species classification.
This paper presents a curated million-image dataset, primarily to train
computer-vision models capable of providing image-based taxonomic assessment,
however, the dataset also presents compelling characteristics, the study of
which would be of interest to the broader machine learning community. Driven by
the biological nature inherent to the dataset, a characteristic long-tailed
class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is
a hierarchical classification scheme, presenting a highly fine-grained
classification problem at lower levels. Beyond spurring interest in
biodiversity research within the machine learning community, progress on
creating an image-based taxonomic classifier will also further the ultimate
goal of all BIOSCAN research: to lay the foundation for a comprehensive survey
of global biodiversity. This paper introduces the dataset and explores the
classification task through the implementation and analysis of a baseline
classifier
BOLD v4: A Centralized Bioinformatics Platform for DNA-based Biodiversity Data
BOLD, the Barcode of Life Data System, supports the acquisition, storage,
validation, analysis, and publication of DNA barcodes, activities requiring the
integration of molecular, morphological, and distributional data. Its pivotal
role in curating the reference library of DNA barcodes, coupled with its data
management and analysis capabilities, make it a central resource for
biodiversity science. It enables rapid, accurate identification of specimens
and also reveals patterns of genetic diversity and evolutionary relationships
among taxa. Launched in 2005, BOLD has become an increasingly powerful tool for
advancing understanding of planetary biodiversity. It currently hosts 17
million specimen records and 14 million barcodes that provide coverage for more
than a million species from every continent and ocean. The platform has the
long-term goal of providing a consistent, accurate system for identifying all
species of eukaryotes. BOLD's integrated analytical tools, full data lifecycle
support, and secure collaboration framework distinguish it from other
biodiversity platforms. BOLD v4 brought enhanced data management and analysis
capabilities as well as novel functionality for data dissemination and
publication. Its next version will include features to strengthen its utility
to the research community, governments, industry, and society-at-large
Species-Level Para- and Polyphyly in DNA Barcode Gene Trees : Strong Operational Bias in European Lepidoptera
AbstractThe proliferation of DNA data is revolutionizing all fields of systematic research. DNA barcode sequences, now available for millions of specimens and several hundred thousand species, are increasingly used in algorithmic species delimitations. This is complicated by occasional incongruences between species and gene genealogies, as indicated by situations where conspecific individuals do not form a monophyletic cluster in a gene tree. In two previous reviews, nonmonophyly has been reported as being common in mitochondrial DNA gene trees. We developed a novel web service “Monophylizer” to detect non-monophyly in phylogenetic trees and used it to ascertain the incidence of species nonmonophyly in COI (a.k.a. cox1) barcode sequence data from 4977 species and 41,583 specimens of European Lepidoptera, the largest data set ofDNAbarcodes analyzed fromthis regard. Particular attentionwas paid to accurate species identification to ensure data integrity. We investigated the effects of tree-building method, sampling effort, and other methodological issues, all of which can influence estimates of non-monophyly. We found a 12% incidence of non-monophyly, a value significantly lower than that observed in previous studies. Neighbor joining (NJ) and maximum likelihood (ML) methods yielded almost equal numbers of non-monophyletic species, but 24.1% of these cases of non-monophyly were only found by one of these methods. Non-monophyletic species tend to show either low genetic distances to their nearest neighbors or exceptionally high levels of intraspecific variability. Cases of polyphyly in COI trees arising as a result of deep intraspecific divergence are negligible, as the detected cases reflected misidentifications or methodological errors. Taking into consideration variation in sampling effort, we estimate that the true incidence of non-monophyly is ∼23%, but with operational factors still being included. Within the operational factors, we separately assessed the frequency of taxonomic limitations (presence of overlooked cryptic and oversplit species) and identification uncertainties. We observed that operational factors are potentially present in more than half (58.6%) of the detected cases of non-monophyly. Furthermore,we observed that in about 20% of non-monophyletic species and entangled species, the lineages involved are either allopatric or parapatric—conditions where species delimitation is inherently subjective and particularly dependent on the species concept that has been adopted. These observations suggest that species-level non-monophyly in COI gene trees is less common than previously supposed, with many cases reflecting misidentifications, the subjectivity of species delimitation or other operational factors.Abstract
The proliferation of DNA data is revolutionizing all fields of systematic research. DNA barcode sequences, now available for millions of specimens and several hundred thousand species, are increasingly used in algorithmic species delimitations. This is complicated by occasional incongruences between species and gene genealogies, as indicated by situations where conspecific individuals do not form a monophyletic cluster in a gene tree. In two previous reviews, nonmonophyly has been reported as being common in mitochondrial DNA gene trees. We developed a novel web service “Monophylizer” to detect non-monophyly in phylogenetic trees and used it to ascertain the incidence of species nonmonophyly in COI (a.k.a. cox1) barcode sequence data from 4977 species and 41,583 specimens of European Lepidoptera, the largest data set ofDNAbarcodes analyzed fromthis regard. Particular attentionwas paid to accurate species identification to ensure data integrity. We investigated the effects of tree-building method, sampling effort, and other methodological issues, all of which can influence estimates of non-monophyly. We found a 12% incidence of non-monophyly, a value significantly lower than that observed in previous studies. Neighbor joining (NJ) and maximum likelihood (ML) methods yielded almost equal numbers of non-monophyletic species, but 24.1% of these cases of non-monophyly were only found by one of these methods. Non-monophyletic species tend to show either low genetic distances to their nearest neighbors or exceptionally high levels of intraspecific variability. Cases of polyphyly in COI trees arising as a result of deep intraspecific divergence are negligible, as the detected cases reflected misidentifications or methodological errors. Taking into consideration variation in sampling effort, we estimate that the true incidence of non-monophyly is ∼23%, but with operational factors still being included. Within the operational factors, we separately assessed the frequency of taxonomic limitations (presence of overlooked cryptic and oversplit species) and identification uncertainties. We observed that operational factors are potentially present in more than half (58.6%) of the detected cases of non-monophyly. Furthermore,we observed that in about 20% of non-monophyletic species and entangled species, the lineages involved are either allopatric or parapatric—conditions where species delimitation is inherently subjective and particularly dependent on the species concept that has been adopted. These observations suggest that species-level non-monophyly in COI gene trees is less common than previously supposed, with many cases reflecting misidentifications, the subjectivity of species delimitation or other operational factors
A reference library for Canadian invertebrates with 1.5 million barcodes, voucher specimens, and DNA samples
The synthesis of this dataset was enabled by funding from the Canada Foundation for Innovation, from Genome Canada through Ontario Genomics, from NSERC, and from the Ontario Ministry of Research, Innovation and Science in support of the International Barcode of Life project. It was also enabled by philanthropic support from the Gordon and Betty Moore Foundation and from Ann McCain Evans and Chris Evans. The release of the data on GGBN was supported by a GGBN – Global Genome Initiative Award and we thank G. Droege, L. Loo, K. Barker, and J. Coddington for their support. Our work depended heavily on the analytical capabilities of the Barcode of Life Data Systems (BOLD, www.boldsystems.org). We also thank colleagues at the CBG for their support, including S. Adamowicz, S. Bateson, E. Berzitis, V. Breton, V. Campbell, A. Castillo, C. Christopoulos, J. Cossey, C. Gallant, J. Gleason, R. Gwiazdowski, M. Hajibabaei, R. Hanner, K. Hough, P. Janetta, A. Pawlowski, S. Pedersen, J. Robertson, D. Roes, K. Seidle, M. A. Smith, B. St. Jacques, A. Stoneham, J. Stahlhut, R. Tabone, J.Topan, S. Walker, and C. Wei. For bioblitz-related assistance, we are grateful to D. Ireland, D. Metsger, A. Guidotti, J. Quinn and other members of Bioblitz Canada and Ontario Bioblitz. For our work in Canada’s national parks, we thank S. Woodley and J. Waithaka for their lead role in organizing permits and for the many Parks Canada staff who facilitated specimen collections, including M. Allen, D. Amirault-Langlais, J. Bastick, C. Belanger, C. Bergman, J.-F. Bisaillon, S. Boyle, J. Bridgland, S. Butland, L. Cabrera, R. Chapman, J. Chisholm, B. Chruszcz, D. Crossland, H. Dempsey, N. Denommee, T. Dobbie, C. Drake, J. Feltham, A. Forshner, K. Forster, S. Frey, L. Gardiner, P. Giroux, T. Golumbia, D. Guedo, N. Guujaaw, S. Hairsine, E. Hansen, C. Harpur, S. Hayes, J. Hofman, S. Irwin, B. Johnston, V. Kafa, N. Kang, P. Langan, P. Lawn, M. Mahy, D. Masse, D. Mazerolle, C. McCarthy, I. McDonald, J. McIntosh, C. McKillop, V. Minelga, C. Ouimet, S. Parker, N. Perry, J. Piccin, A. Promaine, P. Roy, M. Savoie, D. Sigouin, P. Sinkins, R. Sissons, C. Smith, R. Smith, H. Stewart, G. Sundbo, D. Tate, R. Tompson, E. Tremblay, Y. Troutet, K. Tulk, J. Van Wieren, C. Vance, G. Walker, D. Whitaker, C. White, R. Wissink, C. Wong, and Y. Zharikov. For our work near Canada’s ports in Vancouver, Toronto, Montreal, and Halifax, we thank R. Worcester, A. Chreston, M. Larrivee, and T. Zemlak, respectively. Many other organizations improved coverage in the reference library by providing access to specimens – they included the Canadian National Collection of Insects, Arachnids and Nematodes, Smithsonian Institution’s National Museum of Natural History, the Canadian Museum of Nature, the University of Guelph Insect Collection, the Royal British Columbia Museum, the Royal Ontario Museum, the Pacifc Forestry Centre, the Northern Forestry Centre, the Lyman Entomological Museum, the Churchill Northern Studies Centre, and rare Charitable Research Reserve. We also thank the many taxonomic specialists who identifed specimens, including A. Borkent, B. Brown, M. Buck, C. Carr, T. Ekrem, J. Fernandez Triana, C. Guppy, K. Heller, J. Huber, L. Jacobus, J. Kjaerandsen, J. Klimaszewski, D. Lafontaine, J-F. Landry, G. Martin, A. Nicolai, D. Porco, H. Proctor, D. Quicke, J. Savage, B. C. Schmidt, M. Sharkey, A. Smith, E. Stur, A. Tomas, J. Webb, N. Woodley, and X. Zhou. We also thank K. Kerr and T. Mason for facilitating collections at Toronto Zoo and D. Iles for servicing the trap at Wapusk National Park. This paper contributes to the University of Guelph’s Food from Thought research program supported by the Canada First Research Excellence Fund. The Barcode of Life Data System (BOLD; www.boldsystems.org)8 was used as the primary workbench for creating, storing, analyzing, and validating the specimen and sequence records and the associated data resources48. The BOLD platform has a private, password-protected workbench for the steps from specimen data entry to data validation (see details in Data Records), and a public data portal for the release of data in various formats. The latter is accessible through an API (http://www.boldsystems.org/index.php/resources/api?type=webservices) that can also be controlled through R75 with the package ‘bold’76.Peer reviewe
A molecular-based identification resource for the arthropods of Finland
Publisher Copyright: © 2021 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.To associate specimens identified by molecular characters to other biological knowledge, we need reference sequences annotated by Linnaean taxonomy. In this study, we (1) report the creation of a comprehensive reference library of DNA barcodes for the arthropods of an entire country (Finland), (2) publish this library, and (3) deliver a new identification tool for insects and spiders, as based on this resource. The reference library contains mtDNA COI barcodes for 11,275 (43%) of 26,437 arthropod species known from Finland, including 10,811 (45%) of 23,956 insect species. To quantify the improvement in identification accuracy enabled by the current reference library, we ran 1000 Finnish insect and spider species through the Barcode of Life Data system (BOLD) identification engine. Of these, 91% were correctly assigned to a unique species when compared to the new reference library alone, 85% were correctly identified when compared to BOLD with the new material included, and 75% with the new material excluded. To capitalize on this resource, we used the new reference material to train a probabilistic taxonomic assignment tool, FinPROTAX, scoring high success. For the full-length barcode region, the accuracy of taxonomic assignments at the level of classes, orders, families, subfamilies, tribes, genera, and species reached 99.9%, 99.9%, 99.8%, 99.7%, 99.4%, 96.8%, and 88.5%, respectively. The FinBOL arthropod reference library and FinPROTAX are available through the Finnish Biodiversity Information Facility (www.laji.fi) at https://laji.fi/en/theme/protax. Overall, the FinBOL investment represents a massive capacity-transfer from the taxonomic community of Finland to all sectors of society.Peer reviewe
mBRAVE: The Multiplex Barcode Research And Visualization Environment
Widespread interest in the study of metabarcoding has resulted in data proliferation and the development of a multitude of powerful computational tools. Yet consistent and reproducible interpretation of the data remains challenging. The integration of different data types, software tools, and analytical parameters pose a barrier to scaling research. Further, though the majority of the necessary tools for performing these analyses are already implemented, there is limited support for high throughput analysis due to the requirement for heavy computational capacity. As a result of these complexities, many researchers lack the time, training, or infrastructure to work with larger datasets.
mBRAVE, the Multiplex Barcode Research And Visualization Environment, is a cloud-based data storage and analytics platform with standardized pipelines and a sophisticated web interface for transforming raw high-throughput sequencing (HTS) data into biological insights. mBRAVE integrates common analytical methods and links to the Barcode of Life Data (BOLD) System for reference datasets, presenting users with the ability to analyze large volumes of data, without requiring special technical training. mBRAVE's cloud architecture provides centralized and automated storage and compute capacity, thereby reducing the burden on individual researchers.
The mBRAVE platform seeks to alleviate the main informatic challenges faced by the metabarcoding research community: the storage and consistent interpretation of HTS data. It is now available for researcher use at www.mbrave.net
mBRAVE: The Multiplex Barcode Research And Visualization Environment
Widespread interest in the study of metabarcoding has resulted in data proliferation and the development of a multitude of powerful computational tools. Yet consistent and reproducible interpretation of the data remains challenging. The integration of different data types, software tools, and analytical parameters pose a barrier to scaling research. Further, though the majority of the necessary tools for performing these analyses are already implemented, there is limited support for high throughput analysis due to the requirement for heavy computational capacity. As a result of these complexities, many researchers lack the time, training, or infrastructure to work with larger datasets.
mBRAVE, the Multiplex Barcode Research And Visualization Environment, is a cloud-based data storage and analytics platform with standardized pipelines and a sophisticated web interface for transforming raw high-throughput sequencing (HTS) data into biological insights. mBRAVE integrates common analytical methods and links to the Barcode of Life Data (BOLD) System for reference datasets, presenting users with the ability to analyze large volumes of data, without requiring special technical training. mBRAVE's cloud architecture provides centralized and automated storage and compute capacity, thereby reducing the burden on individual researchers.
The mBRAVE platform seeks to alleviate the main informatic challenges faced by the metabarcoding research community: the storage and consistent interpretation of HTS data. It is now available for researcher use at www.mbrave.net.</jats:p
- …
