50 research outputs found
Use of <i>recA</i> as an alternative phylogenetic marker in the family <i>Vibrionaceae</i>
This study analysed the usefulness of recA gene sequences as an alternative phylogenetic and/or identification marker for vibrios. The recA sequences suggest that the genus Vibrio is polyphyletic. The high heterogeneity observed within vibrios was congruent with former polyphasic taxonomic studies on this group. Photobacterium species clustered together and apparently nested within vibrios, while Grimontia hollisae was apart from other vibrios. Within the vibrios, Vibrio cholerae and Vibrio mimicus clustered apart from the other genus members. Vibrio harveyi- and Vibrio splendidus-related species formed compact separated groups. On the other hand, species related to Vibrio tubiashii appeared scattered in the phylogenetic tree. The pairs Vibrio coralliilyticus and Vibrio neptunius, Vibrio nereis and Vibrio xuii and V. tubiashii and Vibrio brasiliensis clustered completely apart from each other. There was a correlation of 0·58 between recA and 16S rDNA pairwise similarities. Strains of the same species have at least 94 % recA sequence similarity. recA gene sequences are much more discriminatory than 16S rDNA. For 16S rDNA similarity values above 98 % there was a wide range of recA similarities, from 83 to 99 %
Data shopping in an open marketplace: Introducing the Ontogrator web application for marking up data using ontologies and browsing using facets
In the future, we hope to see an open and thriving data market in which users can find and select data from a wide range of data providers. In such an open access market, data are products that must be packaged accordingly. Increasingly, eCommerce sellers present heterogeneous product lines to buyers using faceted browsing. Using this approach we have developed the Ontogrator platform, which allows for rapid retrieval of data in a way that would be familiar to any online shopper. Using Knowledge Organization Systems (KOS), especially ontologies, Ontogrator uses text mining to mark up data and faceted browsing to help users navigate, query and retrieve data. Ontogrator offers the potential to impact scientific research in two major ways: 1) by significantly improving the retrieval of relevant information; and 2) by significantly reducing the time required to compose standard database queries and assemble information for further research. Here we present a pilot implementation developed in collaboration with the Genomic Standards Consortium (GSC) that includes content from the StrainInfo, GOLD, CAMERA, Silva and Pubmed databases. This implementation demonstrates the power of ontogration and highlights that the usefulness of this approach is fully dependent on both the quality of data and the KOS (ontologies) used. Ideally, the use and further expansion of this collaborative system will help to surface issues associated with the underlying quality of annotation and could lead to a systematic means for accessing integrated data resources
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification
<p>Abstract</p> <p>Background</p> <p>Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification.</p> <p>Results</p> <p>In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model.</p> <p>Conclusions</p> <p>FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.</p
Microbiological Common Language (MCL): a standard for electronic information exchange in the Microbial Commons
Although Biological Resource Centers (BRCs) traditionally have open catalogs of their holdings, it is quite cumbersome to access meta-information about microorganisms electronically due to the variety of access methods used by those catalogs. Therefore, we propose Microbiological Common Language (MCL), aimed at standardizing the electronic exchange of meta-information about microorganisms. Its application ranges from representing the online catalog of a single collection to accessing the results of StrainInfo integration and ad hoc use in other contexts. The abstract model of the standard precisely defines the elements of the standard, which enables implementation using a variety of representation technologies. Currently, XML and RDF/XML implementations are readily available. MCL is an open standard, and therefore greatly encourages input from the microbiological community