5 research outputs found

    Computational methods for ribosome profiling data analysis

    Get PDF
    Since the introduction of the ribosome profiling technique in 2009 its popularity has greatly increased. It is widely used for the comprehensive assessment of gene expression and for studying the mechanisms of regulation at the translational level. As the number of ribosome profiling datasets being produced continues to grow, so too does the need for reliable software that can provide answers to the biological questions it can address. This review describes the computational methods and tools that have been developed to analyze ribosome profiling data at the different stages of the process. It starts with initial routine processing of raw data and follows with more specific tasks such as the identification of translated open reading frames, differential gene expression analysis, or evaluation of local or global codon decoding rates. The review pinpoints challenges associated with each step and explains the ways in which they are currently addressed. In addition it provides a comprehensive, albeit incomplete, list of publicly available software applicable to each step, which may be a beneficial starting point to those unexposed to ribosome profiling analysis. The outline of current challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician's toolbox for ribosome profiling data analysis

    Development of an online computational platform for the analysis of protein synthesis and detection of novel translated regions

    Get PDF
    Ribosome profiling is a technique that allows us to capture and sequence mRNA fragments protected by ribosome complexes. Mapping these ribosome protected fragments or RPFs, back to a genome or transcriptome provides information on the precise location of elongating ribosomes. This data can then be used to detect novel translated regions, translational pausing and differentially translated genes. Chapter 2 describes the development of Trips-Viz, an interactive online platform for the exploration and visualisation of RPFs mapped to the transcriptomes of various different organisms. This allows users to rapidly aggregate and visualise ribosome profiling data at a single transcript level allowing for visual detection of translated open reading frames. Trips-Viz also allows users to rapidly assess the quality of data through various meta-information plots as well as detect and visualise transcripts that are differentially expressed/translated between two conditions. These analyses can be carried out through a GUI, meaning users do not need any prior coding or command line experience to be able to use them. Chapter 3 describes the major updates made to Trips-Viz since its original publication. This includes the addition of mass spectrometry data. Several thousand human mass spectrometry datasets have been processed and detected peptides mapped to the human transcriptome in the same manner as ribosome profiling data. This allows users to corroborate the evidence from the ribosome profiling data and provides information on whether a translated ORF is capable of producing a stable protein product. The differential expression/translation detection has also been improved with the inclusion of the Deseq2 and Anota2seq software. A method for the automatic detection of translated ORFs was also included which allows users to find translated uORFs, nested ORFs, downstream ORFs in a relatively timely manner. Other improvements include the addition of help videos to guide users through the navigation and interacting with the users interface of Trips-Viz. Finally, incorporating the relevant scripts into RiboGalaxy made it easier for users to upload their own data and transcriptomes to Trips-Viz without any requirement for command line expertise

    An analysis of translation heterogeneity in ribosome profiling data

    Full text link
    Les protéines sont responsables de pratiquement toutes les fonctions performées au sein du corps cellulaire et de ses alentours. Le contrôle de l’expression génique détermine l’abondance, la localisation et le moment de la production de protéines dans la cellule. Il s’agit de l’un des processus centraux à la régulation de la physiologie et du fonctionnement cellulaire. La moindre perte de balance dans ce complexe système engendre des conséquences majeures sur l’intégrité cellulaire, menant au développement de plusieurs maladies parfois incurables. La traduction de l’ARN messager en produit protéique constitue la dernière étape de l’expression génique. Elle est régulée de plusieurs façons, intrinsèques et extrinsèques à la séquence. Il s’agit également du processus cellulaire le plus coûteux en termes d’énergie. Le profilage des ribosomes (Ribo-Seq) figure parmi les récentes et prometteuses technologies ayant permis une meilleure étude des mécanismes de régulation de la traduction. Ces résultats contiennent toutefois la présence de variabilité et de bruits de nature infondée. Ce travail présente la mise en place d’une stratégie permettant la dissociation de signaux d’origine biologique de ceux ayant une origine technique. Ceci est effectué au travers de la mise en place de profiles consensus de densité ribosomale extrait d’une analyse comparative de plusieurs expériences de Ribo-Seq chez la levure (Saccharomyces cerevisiae). Les signaux biologiques dérivés par les profils consensus correspondent avec les signatures de pauses ribosomales connues, telles que les scores de repliements de l’ARNm et la charge des acides aminés. Épatamment, notre stratégie a également permis l’identification de séquences différentiellement transcrites (DT). Ces dernières jouent un rôle sur la cinétique de la phase d’élongation de la traduction, elles comportent notamment une surreprésentation de codons associés aux modifications des ARNs de transfert (tRNAs). Elles se retrouvent d’ailleurs impliquées dans le maintien de l’homéostase cellulaire, ayant une présence marquée chez des gènes prenants part aux mécanismes de biosynthèse de la macromolécule ribosomale ainsi que chez les ARNms aux sublocalisations cellulaires précises, notamment chez les mitochondries et le réticulum endoplasmique (ER). En plus de démontrer les possibilités de découvertes offertes par la technique du Ribo-Seq, cette étude présente une évidence de la nature dynamique et hétérogène du processus de traduction chez la cellule eucaryote. Elle démontre également le rôle de l’information directement encodée dans la séquence dans l’optimisation générale de l’homéostasie cellulaire.Proteins are responsible for virtually all functions performed within and in the surroundings of a cell. The control of gene expression, which determines the amount, localisation and timing of protein production in the cell, is the central processes in the regulation of cellular physiology and function. Any disturbance in this complex system can generate important consequences on cellular integrity, sometimes leading to incurable diseases. The translation of messenger RNA into a protein product is the last step of the gene expression mechanism. It can be regulated in manifold ways, both intrinsically and extrinsically to the transcript sequence. It is also the costliest cellular process in terms of energy. Ribosome profiling (Ribo-Seq) is one of the recent and promising technologies making it possible to better study the mechanisms of translation regulation. Its results have however been shown to display variability in reproducibility and to contain noise of uncharted sources. This work presents the implementation of a strategy for dissociating signals of biological origin from those of technical origin. This is performed by the computation of a consensus profile of ribosomal density derived from a comparative analysis of several Ribo-Seq experiments in yeast (Saccharomyces cerevisiae). The biological signals derived by the consensus profiles correspond with signatures of known ribosomal pauses, such as mRNA folding strength and amino acid charges. Amazingly, our strategy also enabled the identification of differentially transcribed (DT) sequences. The latter have shown an over-representation of codons associated with modifications of transfer RNAs (tRNAs). They are also involved in the control of cellular homeostasis, exhibiting a marked presence in genes involved in ribosome biosynthesis as well as in mRNAs with precise translation sub-localization, particularly in mitochondria and the endoplasmic reticulum (ER). In addition to demonstrating the possibilities of discovery offered by the Ribo-Seq technique, this study also presents evidence of the dynamic and heterogeneous nature of the translation process in the eukaryotic cell. It also showcases its diverse regulatory mechanisms and the role of information directly encoded in the sequence in the general optimization of cellular homeostasis

    A genetic code expansion: investigation of UGA stop codon redefinition in selenoproteins

    Get PDF
    After the genetic code was deciphered in the 1960s, Francis Crick formulated the ‘frozen accident’ hypothesis (Crick, 1968) to describe the origins of the genetic code as universal and resistant to change or evolution. Co-incidentally, evidence of the dynamic nature of genetic decoding emerged through a series of experimental observations which presented various cases of exceptions from what were known as the standard rules of decoding. There is now prevalent understanding and evidence that the genetic code is constantly evolving, and it can be altered by various organisms with possible implications for entire genomes or specific mRNAs. The incorporation of the 21st amino-acid selenocysteine in selenoproteins in response to the UGA translation ‘terminator’ codon is an example of a gene-specific expansion of the code. This thesis will deal primarily with two unique cases of UGA recoding. The first case is the synthesis of selenophosphate synthetase 1 (SPS1) (Chapter 2) whereby an unknown amino acid is inserted in response to a UGA codon in the hymenopteran honeybee, Apis mellifera, which lacks the machinery for Sec incorporation. The various attempts to characterize the amino acid inserted at this position by novel methods are described. In Chapter 3, the first extensive evolutionary analysis of the selenium transporting protein, selenoprotein P (SELENOP) in invertebrates is described with focused characterization in the mollusc, Pacific oyster, Crassostrea gigas. This unique case presented an unprecedentedly high Sec content (46 Sec) in the C-terminal domain of its SELENOP highlighting an extreme case of deviation from the standard genetic code read-out. It was shown that a supplemented heterologous system, was able to facilitate translation of oyster SelenoP mRNA up to the third or fourth Sec codon position of the distal region but was inadequate to produce the full-length protein. Further, the Sec-dedicated protein factor, the oyster SECIS binding protein 2 (SBP2) was characterized and its potential tested for processive Sec-incorporation. Specific mRNA structures in the 3’UTR, termed Selenocysteine Insertion Sequence (SECIS), are essential for the recoding of UGA to specify selenocysteine instead of termination. While previously known multi-Sec codon SelenoP genes have two functionally distinct SECISes, the two in C. gigas showed no distinction in-vitro. In Chapter 4, in-vivo selenium regulation of selenoproteins in C.gigas was investigated by ribosome profiling. Total selenium levels in oyster tissues were found to increase up to 50-fold with supplementation, also resulting to an increase in mRNA abundance and translation. The translation of the full-length Pacific oyster SelenoP demonstrates an inefficient selenocysteine specification at UGA 1 (> 6%) and very high efficiency at the distal UGAs (UGAs 2 to 46). Additional genetic elements relevant to SelenoP translation include a leader ORF, and the RNA structure, termed Initiation Stem Loop (ISL) which were found to potentially modulate ribosome progression in a selenium-dependent manner. It was further validated that selenocysteines were metabolically incorporated in response to UGAs during the synthesis of oyster SELENOP as indicated by 75Selenium labelling experiments. These findings highlight the increasing understanding of the plasticity of the genetic code, as well as the ecological importance of selenium and its diverse utilization across species

    Synonymous codons affect polysome spacing, protein production and protein folding stress: studies of bacterial translation using ribosome profiling

    Get PDF
    The acquisition of protein secondary and tertiary structure depends on the primary sequence of amino acids. However, predicting a protein's folded structure is difficult even with the knowledge of its sequence. It has been suggested that, in addition to encoding the amino acid sequence, genes also encode kinetic information which regulates the ribosome's translation rate. This information might guide nascent protein folding during translation. With the advent of ribosome profiling, a high-throughput sequencing technique which quantifies ribosome density on mRNA, it is now possible to investigate this hypothesis in greater detail. Here, a new way to analyze ribosome profiling data is presented, confirming that ribosome profiling detects ribosome pauses at slow codons. This method is able to precisely determine the locations of the ribosome aminoacyl and peptidyl transfer sites within the ribosome footprint. Next, a simulation tool which models the progression of ribosomes along an mRNA is used to explore the effects of translation initiation and elongation rates on protein expression. This tool can be used to generate testable predictions for how changing the translation rate should affect various experimental observables, including ribosome density. New experimental data, collected from the bacterium Escherichia coli, demonstrate that the sequence of the Firefly (Photinus pyralis) Luciferase mRNA affects its ribosome occupancy. Importantly, ribosome occupancy is differentially influenced by synonymous codons. These data also show that Luc expression is controlled by the 15 codons immediately downstream of the start codon and that greater Luciferase expression levels progressively activate the heat shock response. However, this response appears to saturate, suggesting that the overexpression of foreign proteins in E. coli readily overwhelms the endogenous chaperone system. This result demonstrates that expression level, rather than translation kinetics, determines the yield of folded Luciferase protein in E. coli
    corecore