221 research outputs found
Whole-genome bacterial classification : whole genome taxonomic reclassification and a universal prokaryotic identifier scheme
Binomial taxonomic nomenclature is problematic for prokaryotes. Much existing classification has origins in polyphasic and phenotypic classifications that do not reflect relatedness at molecular level, and is under continual active revision that results in widespread confusion in literature, databases, and historical collections. Taxonomic classification nevertheless remains central to many areas of significant public impact, including development of political policy for legislation, and border control that aims to reduce disease risks to agriculture from pathogenic bacteria. To meet policy goals effectively with diagnostic tools and associate disease risk with identity, historical classifications need to be revisited. We use measures of genomic relatedness and a graph decomposition approach to subdivide enterobacterial plant pathogens into groupings (cliques) based only on inherent properties of their complete genomes. These groups require no arbitrary thresholding and are stable to introduction of new sequences. They are capable of providing a basis for universal indexing of prokaryotes and probabilistic estimates of risk conditioned on clique membership
Metagenomic analysis of the gut microbiome of the common black slug Arion ater in search of novel lignocellulose degrading enzymes
Some eukaryotes are able to gain access to well-protected carbon sources in plant biomass by exploiting microorganisms in the environment or harbored in their digestive system. One is the land pulmonate Arion ater, which takes advantage of a gut microbial consortium that can break down the widely available, but difficult to digest, carbohydrate polymers in lignocellulose, enabling them to digest a broad range of fresh and partially degraded plant material efficiently. This ability is considered one of the major factors that have enabled A. ater to become one of the most widespread plant pest species in Western Europe and North America. Using metagenomic techniques we have characterized the bacterial diversity and functional capability of the gut microbiome of this notorious agricultural pest. Analysis of gut metagenomic community sequences identified abundant populations of known lignocellulose-degrading bacteria, along with well-characterized bacterial plant pathogens. This also revealed a repertoire of more than 3,383 carbohydrate active enzymes (CAZymes) including multiple enzymes associated with lignin degradation, demonstrating a microbial consortium capable of degradation of all components of lignocellulose. This would allow A. ater to make extensive use of plant biomass as a source of nutrients through exploitation of the enzymatic capabilities of the gut microbial consortia. From this metagenome assembly we also demonstrate the successful amplification of multiple predicted gene sequences from metagenomic DNA subjected to whole genome amplification and expression of functional proteins, facilitating the low cost acquisition and biochemical testing of the many thousands of novel genes identified in metagenomics studies. These findings demonstrate the importance of studying Gastropod microbial communities. Firstly, with respect to understanding links between feeding and evolutionary success and, secondly, as sources of novel enzymes with biotechnological potential, such as, CAZYmes that could be used in the production of biofuel
Comprehensive evaluation of CAZyme prediction tools in fungal and bacterial species
Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy is the most comprehensive CAZyme database, cataloguing CAZymes into sequence-based CAZy families. The CAZyme prediction tools dbCAN, CUPP and eCAMI annotate CAZymes with CAZy families. However, these tools have not been independently evaluated on a common high-quality dataset. Additionally, previous evaluations did not evaluate the binary classification of CAZymes/non-CAZymes, and the multilabel classification of CAZymes to multiple CAZy families.Publisher PDFNon peer reviewe
cazy_webscraper : for creating a local CAZy database
Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy (www.cazy.org) is the most comprehensive CAZyme database, but it does not provide methods for automating data retrieval or submitting sequences for annotation. cazy_webscraper retrieves user-specified datasets from CAZy, producing a local SQL database enabling thorough interrogation of the data. cazy_webscraper can also retrieve protein sequences from GenBank and download structure files from RCSB PDB.Publisher PDFNon peer reviewe
Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing
Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform
Improved and extended multilocus sequence typing (MLST) scheme for Streptomyces reveals complex taxonomic structure
Streptomyces species produce over 60% of all clinically-approved bioactive compounds. Continuing discoveries of new natural products suggest that Streptomyces genomes are a promising potential source for novel antibiotics. Comparative genomics and pangenomics are powerful tools for inferring genes involved in the synthesis of novel antibiotics from closely related genomic sequences. Current Streptomyces taxonomy is contested, making correct application of these approaches more difficult. MLST is used for genomic classification by comparing internal sequence fragments of multiple loci. The current Streptomyces MLST scheme comprises six markers and 236 sequence types (STs; only two new STs were reported since 2016). With the recent increase in sequenced Streptomyces we can now ask: (i) what resolution does MLST offer; (ii) does it reveal useful information about the structure of Streptomyces taxonomy; and (iii) does the current marker set adequately discriminate between species (or other useful groups), or could we improve it with a different set of markers? We extended the current scheme to include all available Streptomyces genomes, identifying over 600 novel STs. Using average nucleotide identity, we observed that the scheme diverged form taxonomy and nomenclature, and inadequately captures species diversity and phylogeny: (i) multiple species were found to share a single ST; (ii) multiple distinct STs were required to describe some genomic species; and (iii) some named species were split across unconnected groups of STs in the minimum spanning tree. Here we demonstrate that the extended MLST scheme provides quantitative motivation for reclassification within Streptomyces, and an improved marker scheme
Absence of curli in soil-persistent Escherichia coli is mediated by a C-di-GMP signaling defect and suggests evidence of biofilm-independent niche specialization
peer-reviewedEscherichia coli is commonly viewed as a gastrointestinal commensal or pathogen although an increasing body of evidence suggests that it can persist in non-host environments as well. Curli are a major component of biofilm in many enteric bacteria including E. coli and are important for adherence to different biotic and abiotic surfaces. In this study we investigated curli production in a unique collection of soil-persistent E. coli isolates and examined the role of curli formation in environmental persistence. Although most soil-persistent E. coli were curli-positive, 10% of isolates were curli-negative (17 out of 170). Curli-producing E. coli (COB583, COB585, and BW25113) displayed significantly more attachment to quartz sand than the curli-negative strains. Long-term soil survival experiments indicated that curli production was not required for long-term survival in live soil (over 110 days), as a curli-negative mutant BW25113ΔcsgB had similar survival compared to wild type BW25113. Mutations in two genes associated with c-di-GMP metabolism, dgcE and pdeR, correlated with loss of curli in eight soil-persistent strains, although this did not significantly impair their survival in soil compared to curli-positive strains. Overall, the data indicate that curli-deficient and biofilm-defective strains, that also have a defect in attachment to quartz sand, are able to reside in soil for long periods of time thus pointing to the possibility that niches may exist in the soil that can support long-term survival independently of biofilm formation
cazy_webscraper : For creating a local CAZy database
Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy (www.cazy.org) is the most comprehensive CAZyme database, but it does not provide methods for automating data retrieval or submitting sequences for annotation. cazy_webscraper retrieves user-specified datasets from CAZy, producing a local SQL database enabling thorough interrogation of the data. cazy_webscraper can also retrieve protein sequences from GenBank and download structure files from RCSB PDB
16S rRNA phylogeny and clustering is not a reliable proxy for genome-based taxonomy in Streptomyces
Although Streptomyces is one of the most extensively studied genera of bacteria, their taxonomy remains contested and is suspected to contain significant species-level misclassification. Resolving the classification of Streptomyces would benefit many areas of study and applied microbiology that rely heavily on having an accurate ground truth classification of similar and dissimilar organisms, including comparative genomics-based searches for novel antimicrobials in the fight against the ongoing antimicrobial resistance (AMR) crisis. To attempt a resolution, we investigate taxonomic conflicts between 16S rRNA and whole genome classifications using all available 48,981 full-length 16S rRNA Streptomyces sequences from the combined SILVA, Greengenes, Ribosomal Database Project (RDP) and NCBI (National Center for Biotechnology Information) databases, and 2,276 publicly available Streptomyces genome assemblies. We construct a 16S gene tree for 14,239 distinct Streptomyces 16S rRNA sequences, identifying three major lineages of Streptomyces, and find that existing taxonomic classifications are inconsistent with the tree topology. We also use these data to delineate 16S and whole genome landscapes for Streptomyces, finding that 16S and whole-genome classifications of Streptomyces strains are frequently in disagreement, and in particular that 16S zero-radius Operational Taxonomic Units (zOTUs) are often inconsistent with Average Nucleotide Identity (ANI)-based taxonomy. Our results strongly imply that 16S rRNA sequence data does not map to taxonomy sufficiently well to delineate Streptomyces species reliably, and we propose that alternative markers should instead be adopted by the community for classification and metabarcoding. As much of current Streptomyces taxonomy has been determined or supported by historical 16S sequence data and may in parts be in error, we also propose that reclassification of the genus by alternative approaches is required
Comprehensive evaluation of CAZyme prediction tools in fungal and bacterial species
Carbohydrate Active enZymes (CAZymes) are pivotal in pathogen recognition, signalling, structure and energy metabolism. CAZy is the most comprehensive CAZyme database, cataloguing CAZymes into sequence-based CAZy families. The CAZyme prediction tools dbCAN [2], CUPP and eCAMI annotate CAZymes with CAZy families. However, these tools have not been independently evaluated on a common high-quality dataset. Additionally, previous evaluations did not evaluate the binary classification of CAZymes/non-CAZymes, and the multilabel classification of CAZymes to multiple CAZy families
- …