78 research outputs found

    The Natural Product Domain Seeker NaPDoS: A Phylogeny Based Bioinformatic Tool to Classify Secondary Metabolite Gene Diversity

    Get PDF
    New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry

    Comparative genome structure, secondary metabolite, and effector coding capacity across Cochliobolus pathogens.

    Get PDF
    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP-encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence

    A free-standing condensation enzyme catalyzing ester bond formation in C-1027 biosynthesis

    No full text
    Nonribosomal peptide synthetases (NRPSs) catalyze the biosynthesis of many biologically active peptides and typically are modular, with each extension module minimally consisting of a condensation, an adenylation, and a peptidyl carrier protein domain responsible for incorporation of an amino acid into the growing peptide chain. C-1027 is a chromoprotein antitumor antibiotic whose enediyne chromophore consists of an enediyne core, a deoxy aminosugar, a benzoxazolinate, and a β-amino acid moiety. Bioinformatics analysis suggested that the activation and incorporation of the β-amino acid moiety into C-1027 follows an NRPS mechanism whereby biosynthetic intermediates are tethered to the peptidyl carrier protein SgcC2. Here, we report the biochemical characterization of SgcC5, an NRPS condensation enzyme that catalyzes ester bond formation between the SgcC2-tethered (S)-3-chloro-5-hydroxy-β-tyrosine and (R)-1-phenyl-1,2-ethanediol, a mimic of the enediyne core. SgcC5 uses (S)-3-chloro-5-hydroxy-β-tyrosyl-SgcC2 as the donor substrate and exhibits regiospecificity for the C-2 hydroxyl group of the enediyne core mimic as the acceptor substrate. Remarkably, SgcC5 is also capable of catalyzing amide bond formation, albeit with significantly reduced efficiency, between (S)-3-chloro-5-hydroxy-β-tyrosyl-(S)-SgcC2 and (R)-2-amino-1-phenyl-1-ethanol, an alternative enediyne core mimic bearing an amine at its C-2 position. Thus, SgcC5 is capable of catalyzing both ester and amide bond formation, providing an evolutionary link between amide- and ester-forming condensation enzymes

    Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models

    Get PDF
    Contains fulltext : 118146.pdf (publisher's version ) (Open Access)There is a growing interest in the Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) of microbes, fungi and plants because they can produce bioactive peptides such as antibiotics. The ability to identify the substrate specificity of the enzyme's adenylation (A) and acyl-transferase (AT) domains is essential to rationally deduce or engineer new products. We here report on a Hidden Markov Model (HMM)-based ensemble method to predict the substrate specificity at high quality. We collected a new reference set of experimentally validated sequences. An initial classification based on alignment and Neighbor Joining was performed in line with most of the previously published prediction methods. We then created and tested single substrate specific HMMs and found that their use improved the correct identification significantly for A as well as for AT domains. A major advantage of the use of HMMs is that it abolishes the dependency on multiple sequence alignment and residue selection that is hampering the alignment-based clustering methods. Using our models we obtained a high prediction quality for the substrate specificity of the A domains similar to two recently published tools that make use of HMMs or Support Vector Machines (NRPSsp and NRPS predictor2, respectively). Moreover, replacement of the single substrate specific HMMs by ensembles of models caused a clear increase in prediction quality. We argue that the superiority of the ensemble over the single model is caused by the way substrate specificity evolves for the studied systems. It is likely that this also holds true for other protein domains. The ensemble predictor has been implemented in a simple web-based tool that is available at http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/
    corecore