6,221 research outputs found

    REPARATION : ribosome profiling assisted (re-)annotation of bacterial genomes

    Get PDF
    Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/ REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames

    PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

    Get PDF
    As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

    BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Overlapping genes (OGs) in bacterial genomes are pairs of adjacent genes of which the coding sequences overlap partly or entirely. With the rapid accumulation of sequence data, many OGs in bacterial genomes have now been identified. Indeed, these might prove a consistent feature across all microbial genomes. Our previous work suggests that OGs can be considered as robust markers at the whole genome level for the construction of phylogenies. An online, interactive web server for inferring phylogenies is needed for biologists to analyze phylogenetic relationships among a set of bacterial genomes of interest.</p> <p>Description</p> <p>BPhyOG is an online interactive server for reconstructing the phylogenies of completely sequenced bacterial genomes on the basis of their shared overlapping genes. It provides two tree-reconstruction methods: Neighbor Joining (NJ) and Unweighted Pair-Group Method using Arithmetic averages (UPGMA). Users can apply the desired method to generate phylogenetic trees, which are based on an evolutionary distance matrix for the selected genomes. The distance between two genomes is defined by the normalized number of their shared OG pairs. BPhyOG also allows users to browse the OGs that were used to infer the phylogenetic relationships. It provides detailed annotation for each OG pair and the features of the component genes through hyperlinks. Users can also retrieve each of the homologous OG pairs that have been determined among 177 genomes. It is a useful tool for analyzing the tree of life and overlapping genes from a genomic standpoint.</p> <p>Conclusion</p> <p>BPhyOG is a useful interactive web server for genome-wide inference of any potential evolutionary relationship among the genomes selected by users. It currently includes 177 completely sequenced bacterial genomes containing 79,855 OG pairs, the annotation and homologous OG pairs of which are integrated comprehensively. The reliability of phylogenies complemented by annotations make BPhyOG a powerful web server for genomic and genetic studies. It is freely available at <url>http://cmb.bnu.edu.cn/BPhyOG</url>.</p

    On the extent and role of the small proteome in the parasitic eukaryote Trypanosoma brucei

    Get PDF
    Background: Although technical advances in genomics and proteomics research have yielded a better understanding of the coding capacity of a genome, one major challenge remaining is the identification of all expressed proteins, especially those less than 100 amino acids in length. Such information can be particularly relevant to human pathogens, such as Trypanosoma brucei, the causative agent of African trypanosomiasis, since it will provide further insight into the parasite biology and life cycle. Results: Starting with 993 T. brucei transcripts, previously shown by RNA-Sequencing not to coincide with annotated coding sequences (CDS), homology searches revealed that 173 predicted short open reading frames in these transcripts are conserved across kinetoplastids with 13 also conserved in representative eukaryotes. Mining mass spectrometry data sets revealed 42 transcripts encoding at least one matching peptide. RNAi-induced down-regulation of these 42 transcripts revealed seven to be essential in insect-form trypanosomes with two also required for the bloodstream life cycle stage. To validate the specificity of the RNAi results, each lethal phenotype was rescued by co-expressing an RNAi-resistant construct of each corresponding CDS. These previously non-annotated essential small proteins localized to a variety of cell compartments, including the cell surface, mitochondria, nucleus and cytoplasm, inferring the diverse biological roles they are likely to play in T. brucei. We also provide evidence that one of these small proteins is required for replicating the kinetoplast (mitochondrial) DNA. Conclusions: Our studies highlight the presence and significance of small proteins in a protist and expose potential new targets to block the survival of trypanosomes in the insect vector and/or the mammalian host

    Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

    Get PDF
    The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs , a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable
    corecore