459 research outputs found

    Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi

    Get PDF
    BACKGROUND: During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi – a field where species identification often is prohibitively complex – and the much used ITS locus were chosen as test bed. RESULTS: A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service , users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. DISCUSSION: The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases contain a thorough sampling of taxonomically well-annotated sequences. Taxonomy, held by some to be an old-fashioned trade, has accordingly never been more important. emerencia does not automate the taxonomic process, but it does allow researchers to focus their efforts elsewhere than countless manual BLAST runs and arduous sieving of BLAST hit lists. The emerencia system is available on an open source basis for local installation with any organism and gene group as targets

    galaxieEST: addressing EST identity through automated phylogenetic analysis

    Get PDF
    BACKGROUND: Research involving expressed sequence tags (ESTs) is intricately coupled to the existence of large, well-annotated sequence repositories. Comparatively complete and satisfactory annotated public sequence libraries are, however, available only for a limited range of organisms, rendering the absence of sequences and gene structure information a tangible problem for those working with taxa lacking an EST or genome sequencing project. Paralogous genes belonging to the same gene family but distinguished by derived characteristics are particularly prone to misidentification and erroneous annotation; high but incomplete levels of sequence similarity are typically difficult to interpret and have formed the basis of many unsubstantiated assumptions of orthology. In these cases, a phylogenetic study of the query sequence together with the most similar sequences in the database may be of great value to the identification process. In order to facilitate this laborious procedure, a project to employ automated phylogenetic analysis in the identification of ESTs was initiated. RESULTS: galaxieEST is an open source Perl-CGI script package designed to complement traditional similarity-based identification of EST sequences through employment of automated phylogenetic analysis. It uses a series of BLAST runs as a sieve to retrieve nucleotide and protein sequences for inclusion in neighbour joining and parsimony analyses; the output includes the BLAST output, the results of the phylogenetic analyses, and the corresponding multiple alignments. galaxieEST is available as an on-line web service for identification of fungal ESTs and for download / local installation for use with any organism group at . CONCLUSIONS: By addressing sequence relatedness in addition to similarity, galaxieEST provides an integrative view on EST origin and identity, which may prove particularly useful in cases where similarity searches return one or more pertinent, but not full, matches and additional information on the query EST is needed

    Intraspecific ITS Variability in the Kingdom Fungi as Expressed in the International Sequence Databases and Its Implications for Molecular Species Identification

    Get PDF
    The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit is the most popular locus for species identification and subgeneric phylogenetic inference in sequence-based mycological research. The region is known to show certain variability even within species, although its intraspecific variability is often held to be limited and clearly separated from interspecific variability. The existence of such a divide between intra- and interspecific variability is implicitly assumed by automated approaches to species identification, but whether intraspecific variability indeed is negligible within the fungal kingdom remains contentious. The present study estimates the intraspecific ITS variability in all fungi presently available to the mycological community through the international sequence databases. Substantial differences were found within the kingdom, and the results are not easily correlated to the taxonomic affiliation or nutritional mode of the taxa considered. No single unifying yet stringent upper limit for intraspecific variability, such as the canonical 3% threshold, appears to be applicable with the desired outcome throughout the fungi. Our results caution against simplified approaches to automated ITS-based species delimitation and reiterate the need for taxonomic expertise in the translation of sequence data into species names

    Unbiased probabilistic taxonomic classification for DNA barcoding

    Get PDF
    Motivation: When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors. Results: We demonstrate the performance of PROTAX by using as predictors the output from BLAST, the phylogenetic classification software TIPP, and the RDP classifier. We show that PROTAX improves the predictions of the baseline implementations of TIPP and RDP classifiers, and that it is able to combine complementary information provided by BLAST and TIPP, resulting in accurate and unbiased classifications even with very challenging cases such as 50% mislabeling of reference sequences.Peer reviewe

    Tidying up international nucleotide sequence databases

    Get PDF
    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi

    Mining metadata from unidentified ITS sequences in GenBank: A case study in Inocybe (Basidiomycota)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The lack of reference sequences from well-identified mycorrhizal fungi often poses a challenge to the inference of taxonomic affiliation of sequences from environmental samples, and many environmental sequences are thus left unidentified. Such unidentified sequences belonging to the widely distributed ectomycorrhizal fungal genus <it>Inocybe </it>(<it>Basidiomycota</it>) were retrieved from GenBank and divided into species that were identified in a phylogenetic context using a reference dataset from an ongoing study of the genus. The sequence metadata of the unidentified <it>Inocybe </it>sequences stored in GenBank, as well as data from the corresponding original papers, were compiled and used to explore the ecology and distribution of the genus. In addition, the relative occurrence of <it>Inocybe </it>was contrasted to that of other mycorrhizal genera.</p> <p>Results</p> <p>Most species of <it>Inocybe </it>were found to have less than 3% intraspecific variability in the ITS2 region of the nuclear ribosomal DNA. This cut-off value was used jointly with phylogenetic analysis to delimit and identify unidentified <it>Inocybe </it>sequences to species level. A total of 177 unidentified <it>Inocybe </it>ITS sequences corresponding to 98 species were recovered, 32% of which were successfully identified to species level in this study. These sequences account for an unexpectedly large proportion of the publicly available unidentified fungal ITS sequences when compared with other mycorrhizal genera. Eight <it>Inocybe </it>species were reported from multiple hosts and some even from hosts forming arbutoid or orchid mycorrhizae. Furthermore, <it>Inocybe </it>sequences have been reported from four continents and in climate zones ranging from cold temperate to equatorial climate. Out of the 19 species found in more than one study, six were found in both Europe and North America and one was found in both Europe and Japan, indicating that at least many north temperate species have a wide distribution.</p> <p>Conclusion</p> <p>Although DNA-based species identification and circumscription are associated with practical and conceptual difficulties, they also offer new possibilities and avenues for research. Metadata assembly holds great potential to synthesize valuable information from community studies for use in a species and taxonomy-oriented framework.</p

    Phenotypic and transcriptomic acclimation of the green microalga Raphidocelis subcapitata to high environmental levels of the herbicide diflufenican

    Get PDF
    Herbicide pollution poses a worldwide threat to plants and freshwater ecosystems. However, the understanding of how organisms develop tolerance to these chemicals and the associated trade-off expenses are largely unknown. This study aims to investigate the physiological and transcriptional mechanisms underlying the acclimation of the green microalgal model species Raphidocelis subcapitata (Selenastraceae) towards the herbicide diflufenican, and the fitness costs associated with tolerance development. Algae were exposed for 12 weeks (corresponding to 100 generations) to diflufenican at the two environmental concentrations 10 and 310 ng/L. The monitoring of growth, pigment composition, and photosynthetic performance throughout the experiment revealed an initial dose-dependent stress phase (week 1) with an EC50 of 397 ng/L, followed by a time-dependent recovery phase during weeks 2 to 4. After week 4, R. subcapitata was acclimated to diflufenican exposure with a similar growth rate, content of carotenoids, and photosynthetic performance as the unexposed control algae. This acclimation state of the algae was explored in terms of tolerance acquisition, changes in the fatty acids composition, diflufenican removal rate, cell size, and changes in mRNA gene expression profile, revealing potential fitness costs associated with acclimation, such as up-regulation of genes related to cell division, structure, morphology, and reduction of cell size. Overall, this study demonstrates that R. subcapitata can quickly acclimate to environmental but toxic levels of diflufenican; however, the acclimation is associated with trade-off expenses that result in smaller cell size

    When mycologists describe new species, not all relevant information is provided (clearly enough)

    Get PDF
    Taxonomic mycology struggles with what seems to be a perpetual shortage of resources. Logically, fungal taxonomists should therefore leverage every opportunity to highlight and visualize the importance of taxonomic work, the usefulness of taxonomic data far beyond taxonomy, and the integrative and collaborative nature of modern taxonomy at large. Is mycology really doing that, though? In this study, we went through ten years' worth (2009-2018) of species descriptions of extant fungal taxa - 1,097 studies describing at most ten new species - in five major mycological journals plus one plant journal. We estimated the frequency at which a range of key words, illustrations, and concepts related to ecology, geography, taxonomy, molecular data, and data availability were provided with the descriptions. We also considered a range of science-demographical aspects such as gender bias and the rejuvenation of taxonomy and taxonomists as well as public availability of the results. Our results show that the target audience of fungal specks descriptions appears to be other fungal taxonomists, because many aspects of the new species were presented only implicitly, if at all. Although many of the parameters we estimated show a gradual, and in some cases marked, change for the better over time, they still paint a somewhat bleak picture of mycological taxonomy as a male-dominated field where the wants and needs of an extended target audience are often not understood or even considered. This study hopes to leave a mark on the way fungal species are described by putting the focus on ways in which fungal taxonomy can better anticipate the end users of species descriptions - be they mycologists, other researchers, the public at large, or even algorithms. In the end, fungal taxonomy, too, is likely to benefit from such measures

    Incorporating molecular data in fungal systematics: a guide for aspiring researchers

    Full text link
    The last twenty years have witnessed molecular data emerge as a primary research instrument in most branches of mycology. Fungal systematics, taxonomy, and ecology have all seen tremendous progress and have undergone rapid, far-reaching changes as disciplines in the wake of continual improvement in DNA sequencing technology. A taxonomic study that draws from molecular data involves a long series of steps, ranging from taxon sampling through the various laboratory procedures and data analysis to the publication process. All steps are important and influence the results and the way they are perceived by the scientific community. The present paper provides a reflective overview of all major steps in such a project with the purpose to assist research students about to begin their first study using DNA-based methods. We also take the opportunity to discuss the role of taxonomy in biology and the life sciences in general in the light of molecular data. While the best way to learn molecular methods is to work side by side with someone experienced, we hope that the present paper will serve to lower the learning threshold for the reader.Comment: Submitted to Current Research in Environmental and Applied Mycology - comments most welcom
    corecore