12 research outputs found

    Micropeptides as non-classical bioactive peptides in Eukaryotes, a ribosome profiling centered approach

    Get PDF
    Micropeptiden zijn een nieuwe klasse van bioactieve peptiden die afgeschreven worden van kleine open leesramen. Tot voor kort werden ze zeer vaak over het hoofd gezien tijdens grootschalige annotatieprojecten. Toch werden reeds een handvol micropeptiden ontdekt en na functionele karakterisatie gelinkt aan belangrijke embryologische en morfogenetische functies in plant en dier. Systematische en/of genoomwijde zoektochten naar kleine open leesramen werden reeds uitgevoerd in Saccharomyces cerevisiae, Arabidopsis thaliana en Drosophila melanogaster. Hierbij werden honderden coderende en getranscripteerde sequenties ontdekt. Dergelijke zoektochten in eukaryoten met grotere genomen, zoals zoogdieren, werden omwille van hun computationele complexiteit nog niet uitgevoerd. Genoomwijde annotatieprojecten werden mogelijk gemaakt door de snelle ontwikkeling van goedkope en hoge-doorvoer sequeneringstechnologieën. De zeer recente ontwikkeling van de ribosoom profileringsstrategie, maakte voor het eerst de genoomwijde studie van het translatieproces mogelijk. Doordat vertalende ribosomen in staat zijn om een mRNA fragment te beschermen, kan hun positie bepaald worden. Bijgevolg kunnen open leesramen geprediceerd worden door het combineren van de positionele informatie van initërende en elongerende ribosomen. Binnen dit doctoraal onderzoek werd een genoomwijde in silico identificatiestrategie, steunend op verschillende conserveringskenmerken, gecombineerd met experimentele informatie in de vorm van ribosoom profileringsdata om nieuwe en mogelijks coderende kleine open leesramen te identificeren in het Mus musculus genoom. Op deze manier konden tientallen hoog-geconserveerde, door ribosomen gebonden en mogelijks coderende kleine open leesramen geïdentificeerd worden. Daarnaast werden twee tools ontwikkeld die de analyse van ribosoom sequeneringsdata faciliteren. PROTEOFORMER is de eerste publiek beschikbare bio-informatica analyse pijplijn specifiek gericht op de verwerking van ribosoom profileringsdata. RIBOsORF daarentegen, is specifiek gericht op het identificeren van kleine open leesramen in deze ribosoom profileringsdata en bevat verschillende filter modules die de eventuele peptide coderende eigenschappen van deze kleine open leesramen kunnen nagaan. De resultaten en tools die binnen dit doctoraatsonderzoek bekomen werden, zullen verdere vooruitgang in het micropeptide onderzoeksveld faciliteren. Hun ontdekking en verdere functionele validatie kan en zal meer dan waarschijnlijk een significante impact hebben op zowel onze biologische kennis, als binnen de medische wereld

    sORFs.org : a repository of small ORFs identified by ribosome profiling

    Get PDF
    With the advent of ribosome profiling, a next generation sequencing technique providing a ‘snap-shot’ of translated mRNA in a cell, many short open reading frames (sORFs) with were identified. Follow-up studies revealed the existence of functional peptides, so-called micropeptides, translated from these ‘sORFs’, indicating a new class of bio-active peptides. Over the last few years, several micropeptides exhibiting important cellular functions were discovered. However, ribosome occupancy does not necessarily imply an actual function of the translated peptide, leading to the development of various tools assessing the coding potential of sORFs. Here, we introduce sORFs.org (http://www.sorfs.org), a novel database for sORFs identified using ribosome profiling. Starting from ribosome profiling, sORFs.org identifies sORFs, incorporates state-of-the-art tools and metrics and stores results in a public database. Two query interfaces are provided, a default one enabling quick lookup of sORFs and a BioMart interface providing advanced query and export possibilities. At present, sORFs.org harbors 263 354 sORFs that demonstrate ribosome occupancy, originating from three different cell lines: HCT116 (human), E14_mESC (mouse) and S2 (fruit fly). sORFs.org aims to provide an extensive sORFs database accessible to researchers with limited bioinformatics knowledge, thus enabling easy integration into personal projects

    Mass spectrometry and ribosome profiling, a perfect combination towards a more comprehensive identification strategy of true in vivo protein forms

    Get PDF
    An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS). Recently, a strategy, termed ribosome profiling, based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. In contrast to routinely employed protein databases in proteomics searches, RIBO-seq derived data gives a more representative expression state and accounts for sequence variation information and alternative translation initiation. To verify the potential of ribosome profiling in providing us with a true snapshot of the translational landscape, we devised a proteogenomic approach generating a database of translation products based on ribosome profiling experiments. The raw and untreated RIBO-seq data is analyzed for both splice isoforms and single nucleotide polymorphisms, as such taking into account transcriptional variation. Next to that, RIBO-seq data for translation start site discovery (treated with harringtonine, lactomidomycin or puromycin) is used to obtain a genome wide blueprint of all possible translation initiation sites and as such taking into account translation variation. By adding protein-DB annotation to the genomic RIBO-seq derived data and after in silico translation a protein database is constructed reflecting the full complexity of the proteome. Using a first version of our proteogenomic approach on an undifferentiated mouse embryonic stem cell line (E14) we could demonstrate an increase of the overall protein identification rate with 2.5% as compared to only searching UniProtKB-SwissProt. Furthermore, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated uORFs

    Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs

    Get PDF
    Background: It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e. g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. Results: Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data, hinting to sORF translation. All candidates are visually inspected using an in-house developed genome browser. In this way dozens of highly conserved sORFs, targeted by ribosomes were identified in the mouse genome, putatively encoding micropeptides. Conclusion: Our combined genome-wide approach leads to the prediction of a comprehensive but manageable set of putatively coding sORFs, a very important first step towards the identification of a new class of bioactive peptides, called micropeptides

    PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration

    Get PDF
    An increasing amount of studies integrate mRNA sequencing data into MS-based proteomics to complement the translation product search space. However, several factors, including extensive regulation of mRNA translation and the need for three- or six-frame-translation, impede the use of mRNA-seq data for the construction of a protein sequence search database. With that in mind, we developed the PROTEOFORMER tool that automatically processes data of the recently developed ribosome profiling method (sequencing of ribosome-protected mRNA fragments), resulting in genome-wide visualization of ribosome occupancy. Our tool also includes a translation initiation site calling algorithm allowing the delineation of the open reading frames (ORFs) of all translation products. A complete protein synthesis-based sequence database can thus be compiled for mass spectrometry-based identification. This approach increases the overall protein identification rates with 3% and 11% (improved and new identifications) for human and mouse, respectively, and enables proteome-wide detection of 5'-extended proteoforms, upstream ORF translation and near-cognate translation start sites. The PROTEOFORMER tool is available as a stand-alone pipeline and has been implemented in the galaxy framework for ease of use

    Little things make big things happen: A summary of micropeptide encoding genes

    Get PDF
    Classical bioactive peptides are cleaved from larger precursor proteins and are targeted toward the secretory pathway by means of an N-terminal signaling sequence. In contrast, micropeptides encoded from small open reading frames, lack such signaling sequence and are immediately released in the cytoplasm after translation. Over the past few years many such non-canonical genes (including open reading frames, ORFs smaller than 100 AAs) have been discovered and functionally characterized in different eukaryotic organisms. Furthermore, in silico approaches enabled the prediction of the existence of many more putatively coding small ORFs in the genomes of Sacharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and Mus musculus. However, questions remain as to what the functional role of this new class of eukaryotic genes might be, and how widespread they are. In the future, approaches integrating in silico, conservation-based prediction and a combination of genomic, proteomic and functional validation methods will prove to be indispensable to answer these open questions

    Micro peptides as a new class of bio-active peptides in Eukaryotes

    No full text
    Background : For a long time it was assumed that protein-coding genes were at least 100 AA in length. Besides, algorithms for detection of coding sequence with such very short open reading frame (ORF) length are less reliable since they can be buried in a pile of ‘junk’ ORFs formed by chance. However, over the recent years many of these non-canonical (< 100 AA in length) genes were discovered in different organisms as Arabidopsis, Saccharomyces, and Drosophila. Here, the resulting small peptides (micro-peptides) are translated directly from their small open reading frames (smORFs). Also, recently a first evolutionary conserved micro-peptide (polished rice or tarsal-less, Drosophila) has been functionally characterized, playing its role in early developmental stages. Thanks to advances in sequencing, bioinformatics tools and computing power, it is now possible to scan the genome of different species unceasingly deep, e.g. in a search for this type of small peptides. Methods : Using bio-informatics methods, we performed a systematic search for putatively functional smORFs in both the Drosophila melanogaster and Mus musculus genome. Our search pipeline consists of several steps. We first scan for smORFs with the sORFfinder tool, using a hidden markov model predicting the coding potential of possible open reading frames genome-wide. Secondly, we checked for transcriptional evidence using public or in-house RNA-seq and/or ribosome profiling data of specific embryonic (and larval stages). Thirdly, the pattern of conservation of those detected smORFs was investigated using the UCSC multiple alignments (containing 14 insects for Drosophila melanogaster and 29 vertebrates for Mus musculus). The ratio of synonymous versus non-synonymous mutations, the ORF length and start-stop codon conservation and the number of existing alignments were examined. A customized scoring algorithm, built on all derived properties, allows us to rank these predicted micro-peptides. Results : Based on the aforementioned pipeline, a list of putative micro-peptides with high coding potential was obtained. All predicted micro-peptides are highly conserved on both DNA and AA level, and moreover have a favorable synonymous versus non-synonymous mutation rate. Next, they are supported by experimental evidence by means of (bidirectional) RNAseq, ribosomal profiling data, or Ensembl ncRNA gene annotations. Specific research effort is needed to gather experimental evidence for the translation and functionality of smORFs (e.g. using genetic manipulation and/or ribosome profiling studies). Conclusion : Micro-peptide research is still in its infancy. The combination of the analyses led us to postulate the existence of many new functional smORFs in both fruitfly and mouse. We strongly belief that micro-peptides herald important functions and are, in the same way as microRNAs, an important but long time overlooked class of bio-active molecules

    Deep proteome coverage based on ribosome profiling aids mass spectrometry-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events

    No full text
    An increasing number of studies involve integrative analysis of gene and protein expression data, taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Recently, a strategy, termed ribosome profiling (or RIBO-seq), based on deep sequencing of ribosome-protected mRNA fragments, indirectly monitoring protein synthesis, has been described. We devised a proteogenomic approach constructing a custom protein sequence search space, built from both SwissProt and RIBO-seq derived translation products, applicable for MS/MS spectrum identification. To record the impact of using the constructed deep proteome database we performed two alternative MS-based proteomic strategies: (I) a regular shotgun proteomic and (II) an N-terminal COFRADIC approach. While the former technique gives an overall assessment on the protein and peptide level, the latter technique, specifically enabling the isolation of N-terminal peptides, is very appropriate in validating the RIBO-seq derived (alternative) translation initiation site profile. We demonstrate that this proteogenomic approach increases the overall protein identification rate with 2.5% (e.g. new protein products, new protein splice variants, SNP variant proteins, and N-terminally extended forms of known proteins) as compared to only searching UniProtKB-SwissProt. Furthermore, using this custom database, identification of N-terminal COFRADIC data resulted in detection of 16 alternative start sites giving rise to N-terminally extended protein variants besides the identification of four translated uORFs. Notably, the characterization of these new translation products revealed the use of multiple near-cognate (non-AUG) start codons. As deep sequencing techniques are becoming more standard, less expensive, and widespread, we anticipate that mRNA-seq and especially custom-tailored RIBO-seq will become indispensible in the MS-based protein or peptide identification process

    Micropeptides, the next best thing after micro-RNA?: combining in silico prediction and ribosome profiling in a genome-wide search for novel micropeptides

    Get PDF
    Introduction : It was long assumed that proteins are at least 100 amino acids (AAs) long. Moreover, the detection of short translation products (e.g. coded from small Open Reading Frames, sORFs) is very difficult as the short length makes it hard to distinguish true coding ORFs from ORFs occurring by chance. Nevertheless, over the past few years many such non-canonical genes (with ORFs < 100 AAs) have been discovered in different organisms like Arabidopsis thaliana, Saccharomyces cerevisiae, and Drosophila melanogaster. Thanks to advances in sequencing, bioinformatics and computing power, it is now possible to scan the genome in unprecedented scrutiny, for example in a search of this type of small ORFs. Methods : Using bioinformatics methods, we performed a systematic search for putatively functional sORFs in the Mus musculus genome. A genome-wide scan detected all sORFs which were subsequently analyzed for their coding potential, based on evolutionary conservation at the AA level using UCSC multiple species alignments, and ranked using a Support Vector Machine (SVM) learning model. The ranked sORFs are finally overlapped with ribosome profiling data proving sORF translation. All candidates are visually inspected using an in-house developed genome browser. Preliminary Data : The genome-wide search for sORFs with sORFfinder resulted in the prediction of 2,414,589 single-exon sORFs with high coding potential, out of a total pool of 40,704,347 sORFs. To assess their peptide-coding potential, all sORFs were analyzed using a UCSC multi-species alignment of 8 vertebrate species. For each sORF a number of basic peptide conservation characteristics were deduced and gathered. We used an SVM approach to classify the sORFs into a coding and non-coding group based on all aforementioned characteristics. After training the SVM on 4/5th of the data and testing the SVM on the remainder, we reached a correct classification for up to 93% of the test subjects, with a false positive rate not exceeding 4%. Even with very stringent parameters this genome-wide in silico prediction approach gives rise to hundreds, even thousands of possibly interesting sequences. Therefore we reanalyzed ribosome profiling data obtained from a mouse Embryonic Stem Cells (mESC) sample, uniquely mapping the reads to sORFs located in intergenic or ncRNA regions. Retaining only those sORFs that overlap with ribosome profiles at their start position in the harringtonine treated sample data and that have a sequence coverage of at least 75% relative to the untreated sample data, led to a set of 221 intergenic sORFs and 489 sORFs located in ncRNA regions. Looking only at lincRNA sORFs, as data points to their expression in these regions, further decreases the sample size to 33 sORFs. All sORFs are made accessible through an in-house developed H2G2 genome browser. Next to the sORF information, static visualization tracks are added depicting genomic annotation from Ensembl, phastCons conservation scores and other relevant information. Experimental ribosomal profiling data are incorporated using individual tracks for every analysis on the different samples (with or without harringtonine treatment)
    corecore