1,752 research outputs found

    Genome-Wide Approaches To Study Rna Secondary Structure

    Get PDF
    The central hypothesis of molecular biology depicts RNA as an intermediary conveyor of genetic information. RNA is transcribed from DNA and translated to proteins, the molecular machines of the cell. However, many RNAs do not encode protein and instead function as molecular machines themselves. The most famous examples are ribosomal RNAs and transfer RNAs, which together form the core translational machinery of the cell. Many other non-coding RNAs have been discovered including catalytic and regulatory RNAs. In many cases RNA function is tightly linked to its secondary structure, which is the collection of hydrogen bonds between complimentary RNA sequences that drives these molecules into their three dimensional structure. Over the last decade, technology for determining the sequence of DNA and RNA has advanced rapidly, making transcriptome-wide expression profiling fast and widely available. In this dissertation, I discuss recent efforts to leverage this powerful technology to study, not just RNA expression, but several other aspects of RNA function. In particular, I focus on three tightly linked aspects of RNA biology: RNA-secondary structure, RNA cleavage, and regulatory small RNAs. I introduce a database for integrating, comparing, and contrasting techniques for determining RNA secondary structure including a technique developed in my dissertation laboratory. Additionally, I discuss a newly improved technology capable of detecting RNA cleavage events. Finally, I integrate RNA secondary structure probing and RNA cleavage detection to interrogate a family of genes important for eukaryotic small RNA-mediated silencing. These diverse analyses are just a few examples of the vast promises offered by adapting RNA-sequencing technology to probe RNA function across many cellular processes

    Towards the understanding of transcriptional and translational regulatory complexity

    Get PDF
    Considering the same genome within every cell, the observed phenotypic diversity can only arise from highly regulated mechanisms beyond the encoded DNA sequence. We investigated several mechanisms of protein biosynthesis and analyzed DNA methylation patterns, alternative translation sites, and genomic mutations. As chromatin states are determined by epigenetic modifications and nucleosome occupancy,we conducted a structural superimposition approach between DNA methyltransferase 1 (DNMT1) and the nucleosome, which suggests that DNA methylation is dependent on accessibility of DNMT1 to nucleosome–bound DNA. Considering translation, alternative non–AUG translation initiation was observed. We developed reliable prediction models to detect these alternative start sites in a given mRNA sequence. Our tool PreTIS provides initiation confidences for all frame–independent non–cognate and AUG starts. Despite these innate factors, specific sequence variations can additionally affect a phenotype. We conduced a genome–wide analysis with millions of mutations and found an accumulation of SNPs next to transcription starts that could relate to a gene–specific regulatory signal. We also report similar conservation of canonical and alternative translation sites, highlighting the relevance of alternative mechanisms. Finally, our tool MutaNET automates variation analysis by scoring the impact of individual mutations on cell function while also integrating a gene regulatory network.Da sich in jeder Zelle die gleiche genomische Information befindet, kann die vorliegende phänotypische Vielfalt nur durch hochregulierte Mechanismen jenseits der kodierten DNA– Sequenz erklärt werden. Wir untersuchten Mechanismen der Proteinbiosynthese und analysierten DNA–Methylierungsmuster, alternative Translation und genomische Mutationen. Da die Chromatinorganisation von epigenetischen Modifikationen und Nukleosompositionen bestimmt wird, führten wir ein strukturelles Alignment zwischen DNA–Methyltransferase 1 (DNMT1) und Nukleosom durch. Dieses lässt vermuten, dass DNA–Methylierung von einer Zugänglichkeit der DNMT1 zur nukleosomalen DNA abhängt. Hinsichtlich der Translation haben wir verlässliche Vorhersagemodelle entwickelt, um alternative Starts zu identifizieren. Anhand einer mRNA–Sequenz bestimmt unser Tool PreTIS die Initiationskonfidenzen aller alternativen nicht–AUG und AUG Starts. Auch können sich Sequenzvarianten auf den Phänotyp auswirken. In einer genomweiten Untersuchung von mehreren Millionen Mutationen fanden wir eine Anreicherung von SNPs nahe des Transkriptionsstarts,welche auf ein genspezifisches regulatorisches Signal hindeuten könnte. Außerdem beobachteten wir eine ähnliche Konservierung von kanonischen und alternativen Translationsstarts, was die Relevanz alternativer Mechanismen belegt. Auch bewertet unser Tool MutaNET mit Hilfe von Scores und eines Genregulationsnetzwerkes automatisch den Einfluss einzelner Mutationen auf die Zellfunktion

    Deep learning methods for mining genomic sequence patterns

    Get PDF
    Nowadays, with the growing availability of large-scale genomic datasets and advanced computational techniques, more and more data-driven computational methods have been developed to analyze genomic data and help to solve incompletely understood biological problems. Among them, deep learning methods, have been proposed to automatically learn and recognize the functional activity of DNA sequences from genomics data. Techniques for efficient mining genomic sequence pattern will help to improve our understanding of gene regulation, and thus accelerate our progress toward using personal genomes in medicine. This dissertation focuses on the development of deep learning methods for mining genomic sequences. First, we compare the performance between deep learning models and traditional machine learning methods in recognizing various genomic sequence patterns. Through extensive experiments on both simulated data and real genomic sequence data, we demonstrate that an appropriate deep learning model can be generally made for successfully recognizing various genomic sequence patterns. Next, we develop deep learning methods to help solve two specific biological problems, (1) inference of polyadenylation code and (2) tRNA gene detection and functional prediction. Polyadenylation is a pervasive mechanism that has been used by Eukaryotes for regulating mRNA transcription, localization, and translation efficiency. Polyadenylation signals in the plant are particularly noisy and challenging to decipher. A deep convolutional neural network approach DeepPolyA is proposed to predict poly(A) site from the plant Arabidopsis thaliana genomic sequences. It employs various deep neural network architectures and demonstrates its superiority in comparison with competing methods, including classical machine learning algorithms and several popular deep learning models. Transfer RNAs (tRNAs) represent a highly complex class of genes and play a central role in protein translation. There remains a de facto tool, tRNAscan-SE, for identifying tRNA genes encoded in genomes. Despite its popularity and success, tRNAscan-SE is still not powerful enough to separate tRNAs from pseudo-tRNAs, and a significant number of false positives can be output as a result. To address this issue, tRNA-DL, a hybrid combination of convolutional neural network and recurrent neural network approach is proposed. It is shown that the proposed method can help to reduce the false positive rate of the state-of-art tRNA prediction tool tRNAscan-SE substantially. Coupled with tRNAscan-SE, tRNA-DL can serve as a useful complementary tool for tRNA annotation. Taken together, the experiments and applications demonstrate the superiority of deep learning in automatic feature generation for characterizing genomic sequence patterns

    Deciphering transcriptional regulation in cancer cells and development of a new method to identify key transcriptional regulators and their target genes

    Get PDF
    Cancer cells accumulate genetic changes during carcinogenesis. The dimension of these changes range from point mutations to large chromosomal aberrations. It has been widely accepted that essential genetic programs are thereby dysregulated that normally would prevent uncontrolled cellular division and growth. Transcription factors (TFs) are key proteins of gene regulation and are frequently associated with genetic pathologies, e.g. MYCN in neuroblastomas (NBs). Research on gene regulation -in general or condition-specific- thus is a central aspect in cancer research, and it is also the focus of my work. In a carcinogenesis model of NBs without MYCN-amplification, mutations of chromosome 11q (11q-CNA) are suspected to critically influence tumor development. We were able to refine this model by means of gene expression analysis on 11q-CNA in NBs with different clinical outcome. Gene expression profiles of NBs with unfavorable progression differed significantly between tumors with and without 11q-CNA, whereas 11q-CNA in NBs with favorable outcome is apparently compensated by a yet unknown mechanism. The TF-encoding gene CAMTA1 is located on the chromosomal region 1p, which is frequently deleted in NBs. In vitro experiments with ectopic induction of CAMTA1 yielded CAMTA1-regulated genes with different gene expression profiles that were functionally associated by enrichment analyses with cell cycle regulation and neuronal differentiation. The suggested role of CAMTA1 as a tumor suppressor gene was confirmed by additional in vivo experiments. Furthermore, we studied the effect of MYC and MYCN in NBs without MYCN-amplification and found that these TF also strongly regulate a large number of common target genes according to their own gene expression in these tumors. Promoter analyses and chromatin immunoprecipitation additionally supported the regulation of the determined target genes by MYC/MYCN. The genome-wide application of promoter and enrichment analyses on gene expression data from mouse models enabled us to predict target TFs of Rage signaling. E2f1 and E2f4 were validated experimentally as components of the Rage-dependent gene regulatory network. Finally, we used our experience from gene expression analysis to develop a novel machine learning method to precisely predict TF target gene relationships in human. We combined results from a genome-wide correlation meta-analysis on 4064 microarray gene expression profiles and promoter analyses on TF binding sites with known regulatory interactions between TFs and target genes in our approach. Our method outperformed other comparable methods in human, as we improved shortcomings of other algorithms specifically for higher eukaryotes, in particular the frequently (erroneously) assumed correlation between the mRNA expression of TFs and their target genes. We made our method freely available as a software package with multiple applications like the identification of key TFs in a multiplicity of cellular systems (e.g. cancer cells)

    Recipient Determinants Affecting Conjugational Promiscuity in Enterobacteriaceae

    Get PDF

    Understanding Communication Signals during Mycobacterial Latency through Predicted Genome-Wide Protein Interactions and Boolean Modeling

    Get PDF
    About 90% of the people infected with Mycobacterium tuberculosis carry latent bacteria that are believed to get activated upon immune suppression. One of the fundamental challenges in the control of tuberculosis is therefore to understand molecular mechanisms involved in the onset of latency and/or reactivation. We have attempted to address this problem at the systems level by a combination of predicted functional protein∶protein interactions, integration of functional interactions with large scale gene expression studies, predicted transcription regulatory network and finally simulations with a Boolean model of the network. Initially a prediction for genome-wide protein functional linkages was obtained based on genome-context methods using a Support Vector Machine. This set of protein functional linkages along with gene expression data of the available models of latency was employed to identify proteins involved in mediating switch signals during dormancy. We show that genes that are up and down regulated during dormancy are not only coordinately regulated under dormancy-like conditions but also under a variety of other experimental conditions. Their synchronized regulation indicates that they form a tightly regulated gene cluster and might form a latency-regulon. Conservation of these genes across bacterial species suggests a unique evolutionary history that might be associated with M. tuberculosis dormancy. Finally, simulations with a Boolean model based on the regulatory network with logical relationships derived from gene expression data reveals a bistable switch suggesting alternating latent and actively growing states. Our analysis based on the interaction network therefore reveals a potential model of M. tuberculosis latency

    Quantitative modeling and statistical analysis of protein-DNA binding sites

    Get PDF
    • …
    corecore