456 research outputs found

    mGene.web: a web service for accurate computational gene finding

    Get PDF
    We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp)

    mGene.web: a web service for accurate computational gene finding

    Get PDF
    We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp)

    Universal architecture of bacterial chemoreceptor arrays

    Get PDF
    Chemoreceptors are key components of the high-performance signal transduction system that controls bacterial chemotaxis. Chemoreceptors are typically localized in a cluster at the cell pole, where interactions among the receptors in the cluster are thought to contribute to the high sensitivity, wide dynamic range, and precise adaptation of the signaling system. Previous structural and genomic studies have produced conflicting models, however, for the arrangement of the chemoreceptors in the clusters. Using whole-cell electron cryo-tomography, here we show that chemoreceptors of different classes and in many different species representing several major bacterial phyla are all arranged into a highly conserved, 12-nm hexagonal array consistent with the proposed “trimer of dimers” organization. The various observed lengths of the receptors confirm current models for the methylation, flexible bundle, signaling, and linker sub-domains in vivo. Our results suggest that the basic mechanism and function of receptor clustering is universal among bacterial species and was thus conserved during evolution

    ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes

    Get PDF
    Correct annotation of translation initiation site (TIS) is essential for both experiments and bioinformatics studies of prokaryotic translation initiation mechanism as well as understanding of gene regulation and gene structure. Here we describe a comprehensive database ProTISA, which collects TIS confirmed through a variety of available evidences for prokaryotic genomes, including Swiss-Prot experiments record, literature, conserved domain hits and sequence alignment between orthologous genes. Moreover, by combining the predictions from our recently developed TIS post-processor, ProTISA provides a refined annotation for the public database RefSeq. Furthermore, the database annotates the potential regulatory signals associated with translation initiation at the TIS upstream region. As of July 2007, ProTISA includes 440 microbial genomes with more than 390 000 confirmed TISs. The database is available at http://mech.ctb.pku.edu.cn/protis

    Comparative analysis of an experimental subcellular protein localization assay and in silico prediction methods

    Get PDF
    The subcellular localization of a protein can provide important information about its function within the cell. As eukaryotic cells and particularly mammalian cells are characterized by a high degree of compartmentalization, most protein activities can be assigned to particular cellular compartments. The categorization of proteins by their subcellular localization is therefore one of the essential goals of the functional annotation of the human genome. We previously performed a subcellular localization screen of 52 proteins encoded on human chromosome 21. In the current study, we compared the experimental localization data to the in silico results generated by nine leading software packages with different prediction resolutions. The comparison revealed striking differences between the programs in the accuracy of their subcellular protein localization predictions. Our results strongly suggest that the recently developed predictors utilizing multiple prediction methods tend to provide significantly better performance over purely sequence-based or homology-based predictions

    GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes

    Get PDF
    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies

    Ergatis: a web interface and scalable software system for bioinformatics workflows

    Get PDF
    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users

    Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions – application to Mycobacterium tuberculosis

    Get PDF
    Correct identification of translational start sites is important for understanding protein function and transcriptional regulation. The annotated translational start sites contained in genome databases are often predicted using bioinformatics and are rarely verified experimentally, and so are not all accurate. Therefore, we devised a simple approach for determining translational start sites using a combination of epitope tagging and frameshift mutagenesis. This assay was used to determine the start sites of three Mycobacterium tuberculosis proteins: LexA, SigC and Rv1955. We were able to show that proteins may begin before or after the predicted site. We also found that a small, non-annotated open reading frame upstream of Rv1955 was expressed as a protein, which we have designated Rv1954A. This approach is readily applicable to any bacterial species for which plasmid transformation can be achieved

    Bacterial Lifestyle in a Deep-sea Hydrothermal Vent Chimney Revealed by the Genome Sequence of the Thermophilic Bacterium Deferribacter desulfuricans SSM1

    Get PDF
    The complete genome sequence of the thermophilic sulphur-reducing bacterium, Deferribacter desulfuricans SMM1, isolated from a hydrothermal vent chimney has been determined. The genome comprises a single circular chromosome of 2 234 389 bp and a megaplasmid of 308 544 bp. Many genes encoded in the genome are most similar to the genes of sulphur- or sulphate-reducing bacterial species within Deltaproteobacteria. The reconstructed central metabolisms showed a heterotrophic lifestyle primarily driven by C1 to C3 organics, e.g. formate, acetate, and pyruvate, and also suggested that the inability of autotrophy via a reductive tricarboxylic acid cycle may be due to the lack of ATP-dependent citrate lyase. In addition, the genome encodes numerous genes for chemoreceptors, chemotaxis-like systems, and signal transduction machineries. These signalling networks may be linked to this bacterium's versatile energy metabolisms and may provide ecophysiological advantages for D. desulfuricans SSM1 thriving in the physically and chemically fluctuating environments near hydrothermal vents. This is the first genome sequence from the phylum Deferribacteres
    corecore