300 research outputs found

    ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes

    Get PDF
    Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protis

    Draft Genome Sequence of the Marine Streptomyces sp. Strain PP-C42, Isolated from the Baltic Sea

    Get PDF
    Streptomyces, a branch of aerobic Gram-positive bacteria represents the largest genus of actinobacteria. The streptomycetes are characterized by a complex secondary metabolism and produce over two-thirds of the clinically used natural antibiotics today. Here we report the draft genome sequence of a Streptomyces strain PP-C42 isolated from the marine environment. A subset of unique genes and gene clusters for diverse secondary metabolites as well as antimicrobial peptides (AMPs) could be identified from the genome, showing great promise as a source for novel bioactive compound

    Draft Genome Sequence of the Marine Streptomyces sp. Strain PP-C42, Isolated from the Baltic Sea

    Get PDF
    Streptomyces, a branch of aerobic Gram-positive bacteria represents the largest genus of actinobacteria. The streptomycetes are characterized by a complex secondary metabolism and produce over two-thirds of the clinically used natural antibiotics today. Here we report the draft genome sequence of a Streptomyces strain PP-C42 isolated from the marine environment. A subset of unique genes and gene clusters for diverse secondary metabolites as well as antimicrobial peptides (AMPs) could be identified from the genome, showing great promise as a source for novel bioactive compound

    Understanding the impact of antibiotic therapies on the respiratory tract resistome: A novel pooled-template metagenomic sequencing strategy

    Get PDF
    Determining the effects of antimicrobial therapies on airway microbiology at a population-level is essential. Such analysis allows, for example, surveillance of antibiotic-induced changes in pathogen prevalence, the emergence and spread of antibiotic resistance, and the transmission of multi-resistant organisms. However, current analytical strategies for understanding these processes are limited. Culture- and PCR-based assays for specific microbes require the a priori selection of targets, while antibiotic sensitivity testing typically provides no insight into either the molecular basis of resistance, or the carriage of resistance determinants by the wider commensal microbiota. Shotgun metagenomic sequencing provides an alternative approach that allows the microbial composition of clinical samples to be described in detail, including the prevalence of resistance genes and virulence traits. While highly informative, the application of metagenomics to large patient cohorts can be prohibitively expensive. Using sputum samples from a randomised placebo-controlled trial of erythromycin in adults with bronchiectasis, we describe a novel, cost-effective strategy for screening patient cohorts for changes in resistance gene prevalence. By combining metagenomic screening of pooled DNA extracts with validatory quantitative PCR-based analysis of candidate markers in individual samples, we identify population-level changes in the relative abundance of specific macrolide resistance genes. This approach has the potential to provide an important adjunct to current analytical strategies, particularly within the context of antimicrobial clinical trials

    ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures

    Get PDF
    The program package ‘ClustScan’ (Cluster Scanner) is designed for rapid, semi-automatic, annotation of DNA sequences encoding modular biosynthetic enzymes including polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS) and hybrid (PKS/NRPS) enzymes. The program displays the predicted chemical structures of products as well as allowing export of the structures in a standard format for analyses with other programs. Recent advances in understanding of enzyme function are incorporated to make knowledge-based predictions about the stereochemistry of products. The program structure allows easy incorporation of additional knowledge about domain specificities and function. The results of analyses are presented to the user in a graphical interface, which also allows easy editing of the predictions to incorporate user experience. The versatility of this program package has been demonstrated by annotating biochemical pathways in microbial, invertebrate animal and metagenomic datasets. The speed and convenience of the package allows the annotation of all PKS and NRPS clusters in a complete Actinobacteria genome in 2–3 man hours. The open architecture of ClustScan allows easy integration with other programs, facilitating further analyses of results, which is useful for a broad range of researchers in the chemical and biological sciences

    VIGOR, an annotation program for small viral genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The decrease in cost for sequencing and improvement in technologies has made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. It is possible to completely sequence a small genome within days and this increases the number of publicly available genomes. Among the types of genomes being rapidly sequenced are those of microbial and viral genomes responsible for infectious diseases. However, accurate gene prediction is a challenge that persists for decoding a newly sequenced genome. Therefore, accurate and efficient gene prediction programs are highly desired for rapid and cost effective surveillance of RNA viruses through full genome sequencing.</p> <p>Results</p> <p>We have developed VIGOR (Viral Genome ORF Reader), a web application tool for gene prediction in influenza virus, rotavirus, rhinovirus and coronavirus subtypes. VIGOR detects protein coding regions based on sequence similarity searches and can accurately detect genome specific features such as frame shifts, overlapping genes, embedded genes, and can predict mature peptides within the context of a single polypeptide open reading frame. Genotyping capability for influenza and rotavirus is built into the program. We compared VIGOR to previously described gene prediction programs, ZCURVE_V, GeneMarkS and FLAN. The specificity and sensitivity of VIGOR are greater than 99% for the RNA viral genomes tested.</p> <p>Conclusions</p> <p>VIGOR is a user friendly web-based genome annotation program for five different viral agents, influenza, rotavirus, rhinovirus, coronavirus and SARS coronavirus. This is the first gene prediction program for rotavirus and rhinovirus for public access. VIGOR is able to accurately predict protein coding genes for the above five viral types and has the capability to assign function to the predicted open reading frames and genotype influenza virus. The prediction software was designed for performing high throughput annotation and closure validation in a post-sequencing production pipeline.</p

    Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

    Origin of Saxitoxin Biosynthetic Genes in Cyanobacteria

    Get PDF
    BACKGROUND:Paralytic shellfish poisoning (PSP) is a potentially fatal syndrome associated with the consumption of shellfish that have accumulated saxitoxin (STX). STX is produced by microscopic marine dinoflagellate algae. Little is known about the origin and spread of saxitoxin genes in these under-studied eukaryotes. Fortuitously, some freshwater cyanobacteria also produce STX, providing an ideal model for studying its biosynthesis. Here we focus on saxitoxin-producing cyanobacteria and their non-toxic sisters to elucidate the origin of genes involved in the putative STX biosynthetic pathway. METHODOLOGY/PRINCIPAL FINDINGS:We generated a draft genome assembly of the saxitoxin-producing (STX+) cyanobacterium Anabaena circinalis ACBU02 and searched for 26 candidate saxitoxin-genes (named sxtA to sxtZ) that were recently identified in the toxic strain Cylindrospermopsis raciborskii T3. We also generated a draft assembly of the non-toxic (STX-) sister Anabaena circinalis ACFR02 to aid the identification of saxitoxin-specific genes. Comparative phylogenomic analyses revealed that nine putative STX genes were horizontally transferred from non-cyanobacterial sources, whereas one key gene (sxtA) originated in STX+ cyanobacteria via two independent horizontal transfers followed by fusion. In total, of the 26 candidate saxitoxin-genes, 13 are of cyanobacterial provenance and are monophyletic among the STX+ taxa, four are shared amongst STX+ and STX-cyanobacteria, and the remaining nine genes are specific to STX+ cyanobacteria. CONCLUSIONS/SIGNIFICANCE:Our results provide evidence that the assembly of STX genes in ACBU02 involved multiple HGT events from different sources followed presumably by coordination of the expression of foreign and native genes in the common ancestor of STX+ cyanobacteria. The ability to produce saxitoxin was subsequently lost multiple independent times resulting in a nested relationship of STX+ and STX- strains among Anabaena circinalis strains

    Gene prediction in metagenomic fragments: A large scale machine learning approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions.</p> <p>Results</p> <p>We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability.</p> <p>Conclusion</p> <p>Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA fragments. In particular, the combination of linear discriminants and neural networks is promising and should be considered for integration into metagenomic analysis pipelines. The data sets can be downloaded from the URL provided (see Availability and requirements section).</p
    corecore