424 research outputs found

    Integrative omics analysis of Pseudomonas aeruginosa virus PA5oct highlights the molecular complexity of jumbo phages

    Get PDF
    Pseudomonas virus vB_PaeM_PA5oct is proposed as a model jumbo bacteriophage to investigate phage-bacteria interactions and is a candidate for phage therapy applications. Combining hybrid sequencing, RNA-Seq and mass spectrometry allowed us to accurately annotate its 286,783 bp genome with 461 coding regions including four non-coding RNAs (ncRNAs) and 93 virion-associated proteins. PA5oct relies on the host RNA polymerase for the infection cycle and RNA-Seq revealed a gradual take-over of the total cell transcriptome from 21% in early infection to 93% in late infection. PA5oct is not organized into strictly contiguous regions of temporal transcription, but some genomic regions transcribed in early, middle and late phases of infection can be discriminated. Interestingly, we observe regions showing limited transcription activity throughout the infection cycle. We show that PA5oct upregulates specific bacterial operons during infection including operons pncA-pncB1-nadE involved in NAD biosynthesis, psl for exopolysaccharide biosynthesis and nap for periplasmic nitrate reductase production. We also observe a downregulation of T4P gene products suggesting mechanisms of superinfection exclusion. We used the proteome of PA5oct to position our isolate amongst other phages using a gene-sharing network. This integrative omics study illustrates the molecular diversity of jumbo viruses and raises new questions towards cellular regulation and phage-encoded hijacking mechanisms

    The translational landscape of fission-yeast meiosis and sporulation.

    Get PDF
    Sexual development in Schizosaccharomyces pombe culminates in meiosis and sporulation. We used ribosome profiling to investigate the translational landscape of this process. We show that the translation efficiency of hundreds of genes is regulated in complex patterns, often correlating with changes in RNA levels. Ribosome-protected fragments show a three-nucleotide periodicity that identifies translated sequences and their reading frame. Using this property, we identified 46 new translated genes and found that 24% of noncoding RNAs are actively translated. We also detected 19 nested antisense genes, in which both DNA strands encode translated mRNAs. Finally, we identified 1,735 translated upstream open reading frames (ORFs) in leader sequences. In S. pombe, in contrast with Saccharomyces cerevisiae, sexual development is not accompanied by large increases in upstream ORF use, thus suggesting that this is an organism-specific adaptation, not a general feature of developmental processes.This work was funded by a Biotechnology and Biological Sciences Research Council (BBSRC) research grant to Juan Mata (BB/J007153/1).This is the accepted manuscript. The final version is available from Nature Publishing at http://www.nature.com/nsmb/journal/v21/n7/full/nsmb.2843.html

    Genomic data mining for the computational prediction of small non-coding RNA genes

    Get PDF
    The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.Ph.D.Committee Chair: Dr. G. Tong Zhou; Committee Member: Dr. Arthur Koblasz; Committee Member: Dr. Eberhard Voit; Committee Member: Dr. Xiaoli Ma; Committee Member: Dr. Ying X

    In silico prediction of active RNA genes in legumes

    No full text
    Accumulating evidence suggests that non-coding RNAs (ncRNAs) play key roles in gene regulation and may form the basis of an inter-gene communication system. MicroRNAs are a class of small non-coding RNAs found in both plants and animals that regulate the expression of other genes. Identification and analysis of microRNAs enhances our understanding of the important roles that microRNAs play in this complex regulatory network. The work presented in this thesis constitutes the first large-scale prediction and characterization of both ncRNAs and miRNAs in the model legume Medicago truncatula and Lotus japonicus, and provides a basis for further research on elucidating ncRNA function in legume genomics..

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd

    A coding and non-coding transcriptomic perspective on the genomics of human metabolic disease

    Get PDF
    Genome-wide association studies (GWAS), relying on hundreds of thousands of individuals, have revealed > 200 genomic loci linked to metabolic disease (MD). Loss of insulin sensitivity (IS) is a key component of MD and we hypothesized that discovery of a robust IS transcriptome would help reveal the underlying genomic structure of MD. Using 1,012 human skeletal muscle samples, detailed physiology and a tissue-optimized approach for the quantification of coding (> 18,000) and non-coding (> 15,000) RNA (ncRNA), we identified 332 fasting IS-related genes (CORE-IS). Over 200 had a proven role in the biochemistry of insulin and/or metabolism or were located at GWAS MD loci. Over 50% of the CORE-IS genes responded to clinical treatment; 16 quantitatively tracking changes in IS across four independent studies (P = 0.0000053: negatively: AGL, G0S2, KPNA2, PGM2, RND3 and TSPAN9 and positively: ALDH6A1, DHTKD1, ECHDC3, MCCC1, OARD1, PCYT2, PRRX1, SGCG, SLC43A1 and SMIM8). A network of ncRNA positively related to IS and interacted with RNA coding for viral response proteins (P < 1 × 10−48), while reduced amino acid catabolic gene expression occurred without a change in expression of oxidative-phosphorylation genes. We illustrate that combining in-depth physiological phenotyping with robust RNA profiling methods, identifies molecular networks which are highly consistent with the genetics and biochemistry of human metabolic disease

    From tools and databases to clinically relevant applications in miRNA research

    Get PDF
    While especially early research focused on the small portion of the human genome that encodes proteins, it became apparent that molecules responsible for many key functions were also encoded in the remaining regions. Originally, non-coding RNAs, i.e., molecules that are not translated into proteins, were thought to be composed of only two classes (ribosomal RNAs and transfer RNAs). However, starting from the early 1980s many other non-coding RNA classes were discovered. In the past two decades, small non-coding RNAs (sncRNAs) and in particular microRNAs (miRNAs), have become essential molecules in biological and biomedical research. In this thesis, five aspects of miRNA research have been addressed. Starting from the development of advanced computational software to analyze miRNA data (1), an in-depth understanding of human and non-human miRNAs was generated and databases hosting this knowledge were created (2). In addition, the effects of technological advances were evaluated (3). We also contributed to the understanding on how miRNAs act in an orchestrated manner to target human genes (4). Finally, based on the insights gained from the tools and resources of the mentioned aspects we evaluated the suitability of miRNAs as biomarkers (5). With the establishment of next-generation sequencing, the primary goal of this thesis was the creation of an advanced bioinformatics analysis pipeline for high-throughput miRNA sequencing data, primarily focused on human. Consequently, miRMaster, a web-based software solution to analyze hundreds sequencing samples within few hours was implemented. The tool was implemented in a way that it could support different sequencing technologies and library preparation techniques. This flexibility allowed miRMaster to build a consequent user-base, resulting in over 120,000 processed samples and 1,5 billion processed reads, as of July 2021, and therefore laid out the basis for the second goal of this thesis. Indeed, the implementation of a feature allowing users to share their uploaded data contributed strongly to the generation of a detailed annotation of the human small non-coding transcriptome. This annotation was integrated into a new miRNA database, miRCarta, modelling thousands of miRNA candidates and corresponding read expression profiles. A subset of these candidates was then evaluated in the context of different diseases and validated. The thereby gained knowledge was subsequently used to validate additional miRNA candidates and to generate an estimate of the number of miRNAs in human. The large collection of samples, gathered over many years with miRMaster was also integrated into a web server evaluating miRNA arm shifts and switches, miRSwitch. Finally, we published an updated version of miRMaster, expanding its scope to other species and adding additional downstream analysis capabilities. The second goal of this thesis was further pursued by investigating the distribution of miRNAs across different human tissues and body fluids, as well as the variability of miRNA profiles over the four seasons of the year. Furthermore, small non-coding RNAs in zoo animals were examined and a tissue atlas of small non-coding RNAs for mice was generated. The third goal, the assessment of technological advances, was addressed by evaluating the new combinatorial probe-anchor synthesis-based sequencing technology published by BGI, analyzing the effect of RNA integrity on sequencing data, analyzing low-input library preparation protocols, and comparing template-switch based library preparation protocols to ligation-based ones. In addition, an antibody-based labeling sequencing chemistry, CoolMPS, was investigated. Deriving an understanding of the orchestrated regulation by miRNAs, the fourth goal of this thesis, was pursued in a first step by the implementation of a web server visualizing miRNA-gene interaction networks, miRTargetLink. Subsequently, miRPathDB, a database incorporating pathways affected by miRNAs and their targets was implemented, as well as miEAA 2.0, a web server offering quick miRNA set enrichment analyses in over 130,000 categories spanning 10 different species. In addition, miRSNPdb, a database evaluating the effects of single nucleotide polymorphisms and variants in miRNAs or in their target genes was created. Finally, the fifth goal of the thesis, the evaluation of the suitability of miRNAs as biomarkers for human diseases was tackled by investigating the expression profiles of miRNAs with machine learning. An Alzheimer's disease cohort with over 400 individuals was analyzed, as well as another neurodegenerative disease cohort with multiple time points of Parkinson's disease patients and healthy controls. Furthermore, a lung cancer cohort covering 3,000 individuals was examined to evaluate the suitability of an early detection test. In addition, we evaluated the expression profile changes induced by aging on a cohort of 1,334 healthy individuals and over 3,000 diseased patients. Altogether, the herein described tools, databases and research papers present valuable advances and insights into the miRNA research field and have been used and cited by the research community over 2,000 times as of July 2021.Während insbesondere die frühe Genetik-Forschung sich auf den kleinen Teil des menschlichen Genoms konzentrierte, der für Proteine kodiert, wurde deutlich, dass auch in den übrigen Regionen Moleküle kodiert werden, die für viele wichtige Funktionen verantwortlich sind. Ursprünglich ging man davon aus, dass nicht codierende RNAs, d. h. Moleküle, die nicht in Proteine übersetzt werden, nur aus zwei Klassen bestehen (ribosomale RNAs und Transfer-RNAs). Seit den frühen 1980er Jahren wurden jedoch viele andere nicht-kodierende RNA-Klassen entdeckt. In den letzten zwei Jahrzehnten sind kleine nichtcodierende RNAs (sncRNAs) und insbesondere microRNAs (miRNAs) zu wichtigen Molekülen in der biologischen und biomedizinischen Forschung geworden. In dieser Arbeit werden fünf Aspekte der miRNA-Forschung behandelt. Ausgehend von der Entwicklung fortschrittlicher Computersoftware zur Analyse von miRNA-Daten (1) wurde ein tiefgreifendes Verständnis menschlicher und nicht-menschlicher miRNAs entwickelt und Datenbanken mit diesem Wissen erstellt (2). Darüber hinaus wurden die Auswirkungen des technologischen Fortschritts bewertet (3). Wir haben auch dazu beigetragen, zu verstehen, wie miRNAs koordiniert agieren, um menschliche Gene zu regulieren (4). Schließlich bewerteten wir anhand der Erkenntnisse, die wir mit den Tools und Ressourcen der genannten Aspekte gewonnen hatten, die Eignung von miRNAs als Biomarker (5). Mit der Etablierung der Sequenzierung der nächsten Generation war das primäre Ziel dieser Arbeit die Schaffung einer fortschrittlichen bioinformatischen Analysepipeline für Hochdurchsatz-MiRNA-Sequenzierungsdaten, die sich in erster Linie auf den Menschen konzentriert. Daher wurde miRMaster, eine webbasierte Softwarelösung zur Analyse von Hunderten von Sequenzierproben innerhalb weniger Stunden, implementiert. Das Tool wurde so implementiert, dass es verschiedene Sequenzierungstechnologien und Bibliotheksvorbereitungstechniken unterstützen kann. Diese Flexibilität ermöglichte es miRMaster, eine konsequente Nutzerbasis aufzubauen, die im Juli 2021 über 120.000 verarbeitete Proben und 1,5 Milliarden verarbeitete Reads umfasste, womit die Grundlage für das zweite Ziel dieser Arbeit geschaffen wurde. Die Implementierung einer Funktion, die es den Nutzern ermöglicht, ihre hochgeladenen Daten mit anderen zu teilen, trug wesentlich zur Erstellung einer detaillierten Annotation des menschlichen kleinen nicht-kodierenden Transkriptoms bei. Diese Annotation wurde in eine neue miRNA-Datenbank, miRCarta, integriert, die Tausende von miRNA-Kandidaten und entsprechende Expressionsprofile abbildet. Eine Teilmenge dieser Kandidaten wurde dann im Zusammenhang mit verschiedenen Krankheiten bewertet und validiert. Die so gewonnenen Erkenntnisse wurden anschließend genutzt, um weitere miRNA-Kandidaten zu validieren und eine Schätzung der Anzahl der miRNAs im Menschen vorzunehmen. Die große Sammlung von Proben, die über viele Jahre mit miRMaster gesammelt wurde, wurde auch in einen Webserver integriert, der miRNA-Armverschiebungen und -Wechsel auswertet, miRSwitch. Schließlich haben wir eine aktualisierte Version von miRMaster veröffentlicht, die den Anwendungsbereich auf andere Spezies ausweitet und zusätzliche Downstream-Analysefunktionen hinzufügt. Das zweite Ziel dieser Arbeit wurde weiterverfolgt, indem die Verteilung von miRNAs in verschiedenen menschlichen Geweben und Körperflüssigkeiten sowie die Variabilität der miRNA-Profile über die vier Jahreszeiten hinweg untersucht wurde. Darüber hinaus wurden kleine nichtkodierende RNAs in Zootieren untersucht und ein Gewebeatlas der kleinen nichtkodierenden RNAs für Mäuse erstellt. Das dritte Ziel, die Einschätzung des technologischen Fortschritts, wurde angegangen, indem die neue kombinatorische Sonden-Anker-Synthese-basierte Sequenzierungstechnologie, die vom BGI veröffentlicht wurde, bewertet wurde, die Auswirkungen der RNA-Integrität auf die Sequenzierungsdaten analysiert wurden, Protokolle für die Bibliotheksvorbereitung mit geringem Input analysiert wurden und Protokolle für die Bibliotheksvorbereitung auf der Basis von Template-Switch mit solchen auf Ligationsbasis verglichen wurden. Darüber hinaus wurde eine auf Antikörpern basierende Labeling-Sequenzierungschemie, CoolMPS, untersucht. Das vierte Ziel dieser Arbeit, das Verständnis der orchestrierten Regulation durch miRNAs, wurde in einem ersten Schritt durch die Implementierung eines Webservers zur Visualisierung von miRNA-Gen-Interaktionsnetzwerken, miRTargetLink, verfolgt. Anschließend wurde miRPathDB implementiert, eine Datenbank, die von miRNAs und ihren Zielgenen beeinflusste Pfade enthält, sowie miEAA 2.0, ein Webserver, der schnelle miRNA-Anreicherungsanalysen in über 130.000 Kategorien aus 10 verschiedenen Spezies bietet. Darüber hinaus wurde miRSNPdb, eine Datenbank zur Bewertung der Auswirkungen von Einzelnukleotid-Polymorphismen und Varianten in miRNAs oder ihren Zielgenen, erstellt. Schließlich wurde das fünfte Ziel der Arbeit, die Bewertung der Eignung von miRNAs als Biomarker für menschliche Krankheiten, durch die Untersuchung der Expressionsprofile von miRNAs anhand von maschinellem Lernen angegangen. Eine Alzheimer-Kohorte mit über 400 Personen wurde analysiert, ebenso wie eine weitere neurodegenerative Krankheitskohorte mit Parkinson-Patienten an mehreren Zeitpunkten der Krankheit und gesunden Kontrollen. Außerdem wurde eine Lungenkrebskohorte mit 3.000 Personen untersucht, um die Eignung eines Früherkennungstests zu bewerten. Darüber hinaus haben wir die altersbedingten Veränderungen des Expressionsprofils bei einer Kohorte von 1.334 gesunden Personen und über 3.000 kranken Patienten untersucht. Insgesamt stellen die hier beschriebenen Tools, Datenbanken und Forschungsarbeiten wertvolle Fortschritte und Erkenntnisse auf dem Gebiet der miRNA-Forschung dar und wurden bis Juli 2021 von der Forschungsgemeinschaft über 2.000 Mal verwendet und zitiert

    Long non-coding RNAs in the epigenetic regulation of oligodendrocyte differentiation

    Get PDF
    Long non-coding RNAs (lncRNAs) constitute a heterogeneous class of RNAs with limited coding potential, united by an arbitrarily placed cut off of >200 ntd. The past decade has seen the emergence of lncRNAs as versatile regulators of gene expression, amidst skepticism regarding the biological usefulness of pervasive genomic transcription and its non-coding RNA products prevalent in most eukaryotes. A significant portion of lncRNAs operate in the development and functioning of the mammalian CNS. Oligodendrocytes (OLs) are the myelinating cells of the CNS that are essential for efficient saltatory conduction and axonal survival. They are derived from OL precursors (OPCs) and progress into transcriptomically heterogeneous OL sub-populations along the differentiation pathway to produce mature OLs, capable of myelination. These epigenetic transitions between different OL subpopulations are carefully regulated, spatially and temporally, by a network of transcription factors, chromatin modulators and lncRNAs. In demyelinating diseases like multiple sclerosis (MS), patients suffer immune mediated attacks against myelin. Eventually, remyelination strategies fail due to deficits in OPC migration and OL differentiation at the site of lesions. Thus, understanding molecular mechanisms governing OL differentiation and myelination is crucial not only for understanding OL function in health but also in disease, in order to develop suitable therapeutic interventions. The investigations presented in this thesis explore the role of lncRNAs and RNA-binding proteins in neurodevelopment, particularly in embryonic stem cells (ESCs) and cells of the OL lineage. Article 1 provides a resource for the protein interactome of a key pioneering transcription factor, Sox2, in different nuclear fractions of mouse ESCs. We found Sox2 to be a multifaceted regulator forming interactions with HP1 family of proteins, whose members perform as both activators and repressors in a context dependent manner. In addition to interacting with RBPs involved in post-transcriptional processes, Sox2 also interacted with Rn7sk, a well-known ncRNA involved in the regulation of transcriptional elongation at promoters and enhancers. Although they did not influence each other‘s recruitment to the chromatin, this interaction opens up the possibility for ncRNA mediated modulation of ES transcriptional programs dependent on Sox2. Article 2 draws important insights regarding lncRNAs from a broad transcriptomic resource established from single cell- as well as bulk RNA- sequencing of OL lineage cells from different developmental stages. From a subset of lncRNAs which were found to be specific for certain OL subpopulations, we investigated the role of 2610035D17Rik in modulating the expression of its neighboring gene, Sox9, a transcription factor essential for OPC specification. We decoupled the role of lncRNA transcript from its genomic locus using various loss-of-function strategies and found that the regulation of Sox9 was dependent on the regulatory elements and/or ongoing transcription at the 2610035D17Rik locus, rather than the transcript itself. In Article 4, we investigated a hitherto unexplored RNA-binding function of myelin gene expression factor 2 (Myef2), a known transcriptional repressor of myelin basic protein (MBP). To this end, we uncovered the RNA interactome of Myef2 in a mouse oligodendroglial cell line with individual nucleotide resolution CLIP (iCLIP) followed by sequencing. We show that Myef2 interacts with CUG motifs located within introns and 3‘UTRs of protein-coding genes, a finding which implicates Myef2 in post-transcriptional processes like splicing and RNA stability. Finally, in Article 3 we have identified disease specific transcriptomic profiles of OL lineage cells through single-cell RNA sequencing of OPCs and OLs derived from experimental autoimmune encephalomyelitis (EAE) mice, a model that recapitulates several aspects of MS. EAE specific OPC and OL clusters were enriched for genes involved in antigen processing and presentation (MHC class I/II). We could demonstrate that OPCs can phagocytose myelin debris and MHC-II-expressing OPCs can activate memory and effector CD4-positive T cells. These findings show OL lineage cells as active participants in MS pathology than passive targets. Further, the findings of Article 2 implicate 2610035D17Rik as a regulator of immunomodulatory properties of oligodendroglia, as 2610035D17Rik KO cells showed reduced expression of IFNγ responsive genes and elevated expression of those involved in antigen presentation, compared to the controls, following IFNγ stimulation
    corecore