8 research outputs found
CaractĂ©risation systĂ©matique des motifs de rĂ©gulation en cis Ă lâĂ©chelle transcriptomique et liens avec la localisation des ARN
La localisation subcellulaire de lâARN permet un dĂ©ploiement prompt et spatialement restreint autant des activitĂ©s protĂ©iques que des ARN noncodant. Le trafic dâARN est dirigĂ© par des Ă©lĂ©ments de sĂ©quences (sous-sĂ©quences primaires, structures secondaires), aussi appelĂ©s motifs de rĂ©gulation, prĂ©sents en cis Ă mĂȘme la molĂ©cule dâARN. Ces motifs sont reconnus par des protĂ©ines de liaisons aux ARN qui mĂ©dient lâacheminement des transcrits vers des sites prĂ©cis dans la cellule. Des Ă©tudes rĂ©centes, chez lâembryon de Drosophile, indiquent que la majoritĂ© des ARN ont une localisation subcellulaire asymĂ©trique, suggĂ©rant lâexistence dâun « code de localisation » complexe. Cependant, ceci peut reprĂ©senter un exemple exceptionnel et la question demeurait, jusquâici, si une prĂ©valence comparable de localisation dâARN est observable chez des cellules standards dĂ©veloppĂ©es en culture. De plus, des informations facilement disponibles Ă propos des caractĂ©ristiques de distribution topologique dâinstances de motifs Ă travers des transcriptomes complets Ă©taient jusquâĂ prĂ©sent manquantes.
Afin dâavoir un aperçu de lâĂ©tendue et des propriĂ©tĂ©s impliquĂ©es dans la localisation des ARN, nous avons soumis des cellules de Drosophile (D17) et de lâhumain (HepG2) Ă un fractionnement biochimique afin dâisoler les fractions nuclĂ©aire, cytosolique, membranaire et insoluble. Nous avons ensuite sĂ©quencĂ© en profondeur lâARN extrait et analysĂ© par spectromĂ©trie de masse les protĂ©ines extraites de ces fractions. Nous avons nommĂ© cette mĂ©thode CeFra-Seq. Par des analyses bio-informatiques, jâai ensuite cartographiĂ© lâenrichissement de divers biotypes dâARN (p. ex. ARN messager, ARN long non codant, ARN circulaire) et protĂ©ines au sein des fractions subcellulaires. Ceci a rĂ©vĂ©lĂ© que la distribution dâun large Ă©ventail dâespĂšces dâARN codants et non codants est asymĂ©trique. Une analyse des gĂšnes orthologues entre mouche et humain a aussi dĂ©montrĂ© de fortes similitudes, suggĂ©rant que le processus de localisation est Ă©volutivement conservĂ©. De plus, jâai observĂ© des attributs (p. ex. la taille des transcrits) distincts parmi les populations dâARN messagers spĂ©cifiques Ă une fraction. Finalement, jâai observĂ© des corrĂ©lations et anti-corrĂ©lations spĂ©cifiques entre certains groupes dâARN messagers et leurs protĂ©ines.
Pour permettre lâĂ©tude de la topologie de motifs et de leurs conservations, jâai crĂ©Ă© oRNAment, une base de donnĂ©es dâinstances prĂ©sumĂ©e de sites de liaison de protĂ©ines chez des ARN codants et non codants. Ă partir de donnĂ©es de motifs de liaison protĂ©ique par RNAcompete et par RNA Bind-n-Seq, jâai dĂ©veloppĂ© un algorithme permettant lâidentification rapide dâinstances potentielles de ces motifs dans un transcriptome complet. Jâai pu ainsi cataloguer les instances de 453 motifs provenant de 223 protĂ©ines liant lâARN pour 525 718 transcrits chez cinq espĂšces. Les rĂ©sultats obtenus ont Ă©tĂ© validĂ©s en les comparant Ă des donnĂ©es publiques de eCLIP.
Jâai, par la suite, utilisĂ© oRNAment pour analyser en dĂ©tail les aspects topologiques des instances prĂ©sumĂ©es de ces motifs et leurs conservations Ă©volutives relatives. Ceci a permis de dĂ©montrer que la plupart des motifs sont distribuĂ©s de façon similaire entre espĂšces. De plus, jâai discernĂ© des points communs entre les sous-groupes de protĂ©ines liant des biotypes distincts ou des rĂ©gions dâARN spĂ©cifiques. La prĂ©sence de tels patrons, similaires ou non, entre espĂšces est susceptible de reflĂ©ter lâimportance de leurs fonctions. Dâailleurs, lâanalyse plus dĂ©taillĂ©e du positionnement dâun motif entre rĂ©gions transcriptomiques comparables chez les vertĂ©brĂ©s suggĂšre une conservation syntĂ©nique de ceux-ci, Ă divers degrĂ©s, pour tous les biotypes dâARN. La topologie rĂ©gionale de certaines instances de motifs rĂ©pĂ©tĂ©es apparaĂźt aussi comme Ă©volutivement conservĂ©e et peut ĂȘtre importante afin de permettre une liaison adĂ©quate de la protĂ©ine. Finalement, les rĂ©sultats compilĂ©s avec oRNAment ont permis de postuler sur un nouveau rĂŽle potentiel pour lâARN long non codant HELLPAR comme Ă©ponge de protĂ©ines liant lâARN.
La caractĂ©risation systĂ©matique dâARN localisĂ©s et de motifs de rĂ©gulation en cis prĂ©sentĂ©e dans cette thĂšse dĂ©montre comment lâintĂ©gration dâinformation Ă lâĂ©chelle transcriptomique permet dâĂ©valuer la prĂ©valence de lâasymĂ©trie, les caractĂ©ristiques distinctes et la conservation Ă©volutive de collections dâARN.The subcellular localization of RNA allows a rapid and spatially restricted deployment of protein and noncoding RNA activities. The trafficking of RNA is directed by sequence elements (primary subsequences, secondary structures), also called regulatory motifs, present in cis within the RNA molecule. These motifs are recognized by RNA-binding proteins that mediate the transport of transcripts to specific sites in the cell. Recent studies in the Drosophila embryo indicate that the majority of RNAs display an asymmetric subcellular localization, suggesting the existence of a complex "localization code". However, this may represent an exceptional example and the question remained, until now, whether a comparable prevalence of RNA localization is observable in standard cells grown in culture. In addition, readily available information about the topological distribution of pattern instances across full transcriptomes has been hitherto lacking.
In order to have a broad overview of the extent and properties involved in RNA localization, we subjected Drosophila (D17) and human (HepG2) cells to biochemical fractionation to isolate the nuclear, cytosolic, membrane and insoluble fractions. We then performed deep sequencing on the extracted RNA and analyzed through mass spectrometry the proteins extracted from these fractions. We named this method CeFra-Seq. Through bioinformatics analyses, I then profiled the enrichment of various RNA biotypes (e.g. messenger RNA, long noncoding RNA, circular RNA) and proteins within the subcellular fractions. This revealed the high prevalence of asymmetric distribution of both coding and noncoding RNA species. An analysis of orthologous genes between fly and human has also shown strong similarities, suggesting that the localization process is evolutionarily conserved. In addition, I have observed distinct attributes (e.g. transcript size) among fraction-specific messenger RNA populations. Finally, I observed specific correlations and anti-correlations between defined groups of messenger RNAs and the proteins they encode. To study motifs topology and their conservation, I created oRNAment, a database of putative RNA-binding protein binding sites instances in coding and noncoding RNAs. Using data from protein binding motifs assessed by RNAcompete and by RNA Bind-n-Seq experiments, I have developed an algorithm allowing their rapid identification in a complete transcriptome. I was able to catalog the instances of 453 motifs from 223 RNA-binding proteins for 525,718 transcripts in five species. The results obtained were validated by comparing them with public data from eCLIP.
I then used oRNAment to further analyze the topological aspects of these motifsâ instances and their relative evolutionary conservation. This showed that most motifs are distributed in a similar fashion between species. In addition, I have detected commonalities between the subgroups of proteins linking preferentially distinct biotypes or specific RNA regions. The presence or absence of such pattern between species is likely a reflection of the importance of their functions. Moreover, a more precise analysis of the position of a motif among comparable transcriptomic regions in vertebrates suggests a syntenic conservation, to varying degrees, in all RNA biotypes. The regional topology of certain motifs as repeated instances also appears to be evolutionarily conserved and may be important in order to allow adequate binding of the protein. Finally, the results compiled with oRNAment allowed to postulate on a potential new role for the long noncoding RNA HELLPAR as an RNA-binding protein sponge.
The systematic characterization of RNA localization and cis regulatory motifs presented in this thesis demonstrates how the integration of information at a transcriptomic scale enables the assessment of the prevalence of asymmetry, the distinct characteristics and the evolutionary conservation of RNA clusters
Analyse de la corrĂ©lation conditionnelle dĂ©rivĂ©e de la coĂ©volution dâun systĂšme de trois gĂšnes par un modĂšle du maximum de vraisemblance
Les gĂšnes codant pour des protĂ©ines peuvent souvent ĂȘtre regroupĂ©s et intĂ©grĂ©s en modules fonctionnels par rapport Ă un organelle. Ces modules peuvent avoir des composantes qui suivent une Ă©volution corrĂ©lĂ©e pouvant ĂȘtre conditionnelle Ă un phĂ©notype donnĂ©. Les gĂšnes liĂ©s Ă la motilitĂ© possĂšdent cette caractĂ©ristique, car ils se suivent en cascade en rĂ©ponse Ă des stimuli extĂ©rieurs. Lâhyperthermophilie, dâautre part, est interreliĂ©e Ă la reverse gyrase, cependant aucun autre Ă©lĂ©ment qui pourrait y ĂȘtre associĂ© avec
certitude nâest connu. Ceci peut ĂȘtre dĂ» Ă un dĂ©placement de gĂšnes non orthologues encore non rĂ©solu. En utilisant une approche bio-informatique, une modĂ©lisation mathĂ©matique dâĂ©volution conditionnelle corrĂ©lĂ©e pour trois gĂšnes a Ă©tĂ© dĂ©veloppĂ©e et appliquĂ©e sur des profils phylĂ©tiques dâarchaea. Ceci a permis dâĂ©tablir des thĂ©ories quant Ă la fonction potentielle du gĂšne du flagelle FlaD/E ainsi que lâhistoire Ă©volutive des gĂšnes lui Ă©tant liĂ©s et ayant contribuĂ© Ă sa formation. De plus, une histoire Ă©volutive thĂ©orique a Ă©tĂ© Ă©tablie pour une ligase liĂ©e Ă lâhyperthermophilie.Protein coding gene may often be grouped and integrated in functional modules with respect to an organelle. These modules may have constituents that follow a conditional correlated evolution to a given phenotype. Genes linked to motility posses this characteristic as they follow a cascade in response to external stimuli. Similarly, hyperthermophily is related to reverse gyrase, however no other element that could be associated with certainty is known. This may be caused by an unresolved case of non-orthologous gene displacement. Using a bioinformatic approach, a mathematical model for conditional correlated evolution for three genes has been developed and applied to the phyletic
profiles of archaea. This has helped to develop theories about the potential functions of the flagellar gene FlaD/E and the evolutionary history of the genes that are linked to it and that may have contributed to its formation. In addition, a theoretical evolutionary history has been established for a ligase associated with hyperthermophily
Data for the generation of RNA spatiotemporal distributions and interpretation of Chk1 and SLBP protein depletion phenotypes during Drosophila embryogenesis
The data presented in this article is related to the research article entitled âBiochemical Fractionation of Time-Resolved Drosophila Embryos Reveals Similar Transcriptomic Alterations in Replication Checkpoint and Histone mRNA Processing Mutantsâ (Lefebvre et al., 2017) [1]. This article provides a spatiotemporal transcriptomic analysis of early embryogenesis and shows that mutations in the checkpoint factor grapes/Chk1 and the histone mRNA processing factor SLBP selectively impair zygotic gene expression. Here, lists of transcripts enriched in early syncytial embryos, late blastoderm embryos, cytoplasmic and nuclear extracts of blastoderm embryos are made public, along with transcription factor motif occurrence for genes enriched in each context. In addition, extensive lists of genes down-regulated upon Chk1 and SLBP protein depletion in embryos are released to enable further analyses
RBP Image Database: A resource for the systematic characterization of the subcellular distribution properties of human RNA binding proteins
Abstract
RNA binding proteins (RBPs) are central regulators of gene expression implicated in all facets of RNA metabolism. As such, they play key roles in cellular physiology and disease etiology. Since different steps of post-transcriptional gene expression tend to occur in specific regions of the cell, including nuclear or cytoplasmic locations, defining the subcellular distribution properties of RBPs is an important step in assessing their potential functions. Here, we present the RBP Image Database, a resource that details the subcellular localization features of 301 RBPs in the human HepG2 and HeLa cell lines, based on the results of systematic immuno-fluorescence studies conducted using a highly validated collection of RBP antibodies and a panel of 12 markers for specific organelles and subcellular structures. The unique features of the RBP Image Database include: (i) hosting of comprehensive representative images for each RBP-marker pair, with âŒ250,000 microscopy images; (ii) a manually curated controlled vocabulary of annotation terms detailing the localization features of each factor; and (iii) a user-friendly interface allowing the rapid querying of the data by target or annotation. The RBP Image Database is freely available at https://rnabiology.ircm.qc.ca/RBPImage/.</jats:p
Recommended from our members
A large-scale binding and functional map of human RNA-binding proteins.
Many proteins regulate the expression of genes by binding to specific regions encoded in the genome1. Here we introduce a new data set of RNA elements in the human genome that are recognized by RNA-binding proteins (RBPs), generated as part of the Encyclopedia of DNA Elements (ENCODE) project phase III. This class of regulatory elements functions only when transcribed into RNA, as they serve as the binding sites for RBPs that control post-transcriptional processes such as splicing, cleavage and polyadenylation, and the editing, localization, stability and translation of mRNAs. We describe the mapping and characterization of RNA elements recognized by a large collection of human RBPs in K562 and HepG2 cells. Integrative analyses using five assays identify RBP binding sites on RNA and chromatin in vivo, the in vitro binding preferences of RBPs, the function of RBP binding sites and the subcellular localization of RBPs, producing 1,223 replicated data sets for 356 RBPs. We describe the spectrum of RBP binding throughout the transcriptome and the connections between these interactions and various aspects of RNA biology, including RNA stability, splicing regulation and RNA localization. These data expand the catalogue of functional elements encoded in the human genome by the addition of a large set of elements that function at the RNA level by interacting with RBPs
Perspectives on ENCODE
The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.11Nsciescopu
Expanded encyclopaedias of DNA elements in the human and mouse genomes
AbstractThe human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.11Nsciescopu