4 research outputs found

    Effects of using coding potential, sequence conservation and mRNA structure conservation for predicting pyrroly-sine containing genes

    Get PDF
    BACKGROUND: Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes. RESULTS: We propose a strategy to predict pyrrolysine encoding genes in genomes of archaea and bacteria. We cluster open reading frames interrupted by the amber codon based on sequence similarity. We rank these clusters according to several features that may influence pyrrolysine translation. The ranking effects of different features are assessed and we propose a weighted combination of these features which best explains the currently known pyrrolysine incorporating genes. We devote special attention to the effect of structural conservation and provide further substantiation to support that structural conservation may be influential – but is not a necessary factor. Finally, from the weighted ranking, we identify a number of potentially pyrrolysine incorporating genes. CONCLUSIONS: We propose a method for prediction of pyrrolysine incorporating genes in genomes of bacteria and archaea leading to insights about the factors driving pyrrolysine translation and identification of new gene candidates. The method predicts known conserved genes with high recall and predicts several other promising candidates for experimental verification. The method is implemented as a computational pipeline which is available on request

    A computational method to predict genetically encoded rare amino acids in proteins

    Get PDF
    In several natural settings, the standard genetic code is expanded to incorporate two additional amino acids with distinct functionality, selenocysteine and pyrrolysine. These rare amino acids can be overlooked inadvertently, however, as they arise by recoding at certain stop codons. We report a method for such recoding prediction from genomic data, using read-through similarity evaluation. A survey across a set of microbial genomes identifies almost all the known cases as well as a number of novel candidate proteins

    Visualising ribosome profiling and using it for reading frame detection and exploration of eukaryotic translation initiation

    Get PDF
    Ribosome profiling (ribo-seq) is a recently developed technique that provides genomewide information on protein synthesis (GWIPS) in vivo. The high resolution of ribo-seq is one of the exciting properties of this technique. In Chapter 2, I present a computational method that utilises the sub-codon precision and triplet periodicity of ribosome profiling data to detect transitions in the translated reading frame. Application of this method to ribosome profiling data generated for human HeLa cells allowed us to detect several human genes where the same genomic segment is translated in more than one reading frame. Since the initial publication of the ribosome profiling technique in 2009, there has been a proliferation of studies that have used the technique to explore various questions with respect to translation. A review of the many uses and adaptations of the technique is provided in Chapter 1. Indeed, owing to the increasing popularity of the technique and the growing number of published ribosome profiling datasets, we have developed GWIPS-viz (http://gwips.ucc.ie), a ribo-seq dedicated genome browser. Details on the development of the browser and its usage are provided in Chapter 3. One of the surprising findings of ribosome profiling of initiating ribosomes carried out in 3 independent studies, was the widespread use of non-AUG codons as translation initiation start sites in mammals. Although initiation at non-AUG codons in mammals has been documented for some time, the extent of non-AUG initiation reported by these ribo-seq studies was unexpected. In Chapter 4, I present an approach for estimating the strength of initiating codons based on the leaky scanning model of translation initiation. Application of this approach to ribo-seq data illustrates that initiation at non-AUG codons is inefficient compared to initiation at AUG codons. In addition, our approach provides a probability of initiation score for each start site that allows its strength of initiation to be evaluated

    Deciphering context-dependent amber suppression efficiency in mammalian cells with an expanded genetic code

    Get PDF
    The genetic code of organisms can be expanded by introducing orthogonal translation systems (OTSs). One of the most commonly applied OTSs in mammalian cells is the archaeal pyrrolysyl-tRNA synthetase/tRNA_Pyl_CUA (PylRS/PylT) pair from Methanosarcina species. Thereby, usually in-frame amber stop codons (UAG) are suppressed to site-specifically incorporate non-canonical amino acids (ncAAs) into target proteins. These ncAAs can harbor unique chemical moieties, allowing to probe or engineer protein structure and function with high precision. To date, applicability of an expanded genetic code has been particularly advanced in bacteria by optimizing OTS components, modifying host translation, and developing mutually orthogonal translation systems. In mammalian cells, development of genetic code expansion tools has been largely focused on intrinsic properties of the OTS itself, for instance by engineering OTS components or tuning their expression levels. However, several-fold differences in ncAA incorporation efficiency are frequently observed between different amber stop codon positions within a target protein. These unpredictable variations in incorporation efficiencies substantially hamper the theoretical advantage of ncAAs to modify any user-defined site within a target protein. Here, applying a proteomics-based approach and fluorescent reporter system, we compute and validate a linear regression model that predicts ncAA incorporation efficiency in mammalian cells based on the nucleotide context. Thereby, we demonstrate that the immediate context directly modulates the competition between ncAA incorporation and termination at UAG. Moreover, our data support a molecular model in which the identity of up- and downstream nucleotides influences translational efficiency independent of amino acid and tRNA identity. Instead, base stacking of neighboring nucleotides might uniquely affect codon-anticodon base pairing during decoding of UAG. Additionally, context-specific ribosomal pausing and speed could contribute to varying ncAA incorporation efficiency. Furthermore, treatment with aminoglycosides and inhibition of nonsense mediated decay are proposed to improve yields of ncAA-modified proteins in mammalian cells. Taken together, our strategy not only facilitates the applicability of an expanded genetic code in mammalian cells, but should also prove useful in further deciphering the molecular mechanisms that govern context effects in translational efficiency. A better general understanding of context effects in translation would in turn benefit synthetic expansion of the genetic code.Der genetische Code von Organismen kann durch die Einbringung orthogonaler Translationssysteme (OTSe) erweitert werden. Das Pyrrolysyl-tRNA Synthetase/tRNA_Pyl_CUA (PylRS/PylT) Paar der Spezies Methanosarcina ist eines der am häufigsten angewendeten OTSe in Säugerzellen. Üblicherweise wird damit das amber Stoppcodon (UAG) innerhalb eines Leserasters supprimiert, um an spezifischen Stellen eines Zielproteins nicht-kanonische Aminosäuren (nkASn) einzubauen. Diese nkASn können einzigartige chemische Motive enthalten, die es ermöglichen die Struktur und Funktion von Proteinen mit hoher Präzision zu untersuchen und zu manipulieren. Bisher wurde insbesondere in Bakterien die Anwendbarkeit eines erweiterten genetischen Codes verbessert, indem OTS Komponenten optimiert, die Translation in Wirtsorganismen modifiziert und wechselseitig orthogonale Translationssysteme entwickelt wurden. Die Weiterentwicklung von Methoden, um den genetischen Code in Säugerzellen zu erweitern, fokussierte sich überwiegend auf intrinsische Eigenschaften der OTSe selbst, zum Beispiel der Modifizierung von OTS Komponenten oder der Anpassung ihrer Expressionslevel. Häufig unterscheiden sich jedoch verschiedene UAG Positionen in ihrer Effizienz eine nkAS einzubauen in mehrfacher Höhe. Diese unvorhersehbaren Schwankungen in der Einbaueffizienz schränken den Vorteil von nkASn erheblich ein, theoretisch jede benutzerdefinierte Position innerhalb eines Zielproteins modifizieren zu können. In dieser Publikation berechnen und validieren wir mit Hilfe einer proteomischen Methode und eines fluoreszierenden Reportersystems ein lineares Regressionsmodell, das anhand des Nukleotidkontextes die Effizienz des nkAS Einbaus in Säugerzellen vorhersagt. Wir zeigen dadurch, dass der unmittelbare Kontext direkt das Verhältnis zwischen nkAS Einbau und Termination an UAG moduliert. Unsere Daten unterstützen zudem ein molekulares Modell, in dem die Identität der vorherigen und nachfolgenden Nukleotide die Effizienz der Translation unabhängig von der Identität der Aminosäure und tRNA beeinflusst. Hingegen könnte sich ein Basen-Stacking über benachbarte Nukleotide in einzigartiger Weise auf die Codon-Anticodon Basenpaarung während der Dekodierung von UAG auswirken. Zusätzlich könnten ein Pausieren sowie die Geschwindigkeit des Ribosoms in Abhängigkeit vom Kontext zu der uneinheitlichen Effizienz des nkAS Einbaus beitragen. Des Weiteren werden ein Behandlungsverfahren mit Aminoglycosiden und eine Inhibierung des Nonsense-mediated Decay vorgeschlagen, um die Ausbeute an nkAS-modifizierten Proteinen zu verbessern. Zusammenfassend vereinfacht unsere Strategie nicht nur die Anwendbarkeit eines erweiterten genetischen Codes in Säugerzellen, sondern sollte sich auch als nützlich erweisen, um die molekularen Mechanismen, über die der Kontext die Translationseffizienz beeinflusst, weiter zu entschlüsseln. Ein besseres allgemeines Verständnis der Kontexteffekte bei der Translation würde wiederum die synthetische Erweiterung des genetischen Codes fördern
    corecore