The Marvelous World of tRNAs: From Accurate Mapping to Chemical Modifications

Abstract

Since the discovery of transfer RNAs (tRNAs) as decoders of the genetic code, life science has transformed. Particularly, as soon as the importance of tRNAs in protein synthesis has been established, researchers recognized that the functionality of tRNAs in cellular regulation exceeds beyond this paradigm. A strong impetus for these discoveries came from advances in large-scale RNA sequencing (RNA-seq) and increasingly sophisticated algorithms. Sequencing tRNAs is challenging both experimentally and in terms of the subsequent computational analysis. In RNA-seq data analysis, mapping tRNA reads to a reference genome is an error-prone task. This is in particular true, as chemical modifications introduce systematic reverse transcription errors while at the same time the genomic loci are only approximately identical due to the post-transcriptional maturation of tRNAs. Additionally, their multi-copy nature complicates the precise read assignment to its true genomic origin. In the course of the thesis a computational workflow was established to enable accurate mapping of tRNA reads. The developed method removes most of the mapping artifacts introduced by simpler mapping schemes, as demonstrated by using both simulated and human RNA-seq data. Subsequently, the resulting mapping profiles can be used for reliable identification of specific chemical tRNA modifications with a false discovery rate of only 2%. For that purpose, computational analysis methods were developed that facilitates the sensitive detection and even classification of most tRNA modifications based on their mapping profiles. This comprised both untreated RNA-seq data of various species, as well as treated data of Bacillus subtilis that has been designed to display modifications in a specific read-out in the mapping profile. The discussion focuses on sources of artifacts that complicate the profiling of tRNA modifications and strategies to overcome them. Exemplary studies on the modification pattern of different human tissues and the developmental stages of Dictyostelium discoideum were carried out. These suggested regulatory functions of tRNA modifications in development and during cell differentiation. The main experimental difficulties of tRNA sequencing are caused by extensive, stable secondary structures and the presence of chemical modifications. Current RNA-seq methods do not sample the entire tRNA pool, lose short tRNA fragments, or they lack specificity for tRNAs. Within this thesis, the benchmark and improvement of LOTTE-seq, a method for specific selection of tRNAs for high-throughput sequencing, exhibited that the method solves the experimental challenges and avoids the disadvantages of previous tRNA-seq protocols. Applying the accurate tRNA mapping strategy to LOTTE-seq and other tRNA-specific RNA- seq methods demonstrated that the content of mature tRNAs is highest in LOTTE-seq data, ranging from 90% in Spinacia oleracea to 100% in D. discoideum. Additionally, the thesis addressed the fact that tRNAs are multi-copy genes that undergo concerted evolution which keeps sequences of paralogous genes effectively identical. Therefore, it is impossible to distinguish orthologs from paralogs by sequence similarity alone. Synteny, the maintenance of relative genomic positions, is helpful to disambiguate evolutionary relationships in this situation. During this thesis a workflow was computed for synteny-based orthology identification of tRNA genes. The workflow is based on the use of pre-computed genome-wide multiple sequence alignment blocks as anchors to establish syntenic conservation of sequence intervals. Syntenic clusters of concertedly evolving genes of different tRNA families are then subdivided and processed by cograph editing to recover their duplication histories. A useful outcome of this study is that it highlights the technical problems and difficulties associated with an accurate analysis of the evolution of multi-copy genes. To showcase the method, evolution of tRNAs in primates and fruit flies were reconstructed. In the last decade, a number of reports have described novel aspects of tRNAs in terms of the diversity of their genes. For example, nuclear-encoded mitochondrial-derived tRNAs (nm-tRNAs) have been reported whose presence provokes intriguing questions about their functionality. Within this thesis an annotation strategy was developed that led to the identification of 335 and 43 novel nm-tRNAs in human and mouse, respectively. Interestingly, downstream analyses showed that the localization of several nm-tRNAs in introns and the over-representation of conserved RNA-binding sites of proteins involved in splicing suggest a potential regulatory function of intronic nm-tRNAs in splicing

    Similar works