8 research outputs found

    CaractĂ©risation systĂ©matique des motifs de rĂ©gulation en cis Ă  l’échelle transcriptomique et liens avec la localisation des ARN

    Full text link
    La localisation subcellulaire de l’ARN permet un dĂ©ploiement prompt et spatialement restreint autant des activitĂ©s protĂ©iques que des ARN noncodant. Le trafic d’ARN est dirigĂ© par des Ă©lĂ©ments de sĂ©quences (sous-sĂ©quences primaires, structures secondaires), aussi appelĂ©s motifs de rĂ©gulation, prĂ©sents en cis Ă  mĂȘme la molĂ©cule d’ARN. Ces motifs sont reconnus par des protĂ©ines de liaisons aux ARN qui mĂ©dient l’acheminement des transcrits vers des sites prĂ©cis dans la cellule. Des Ă©tudes rĂ©centes, chez l’embryon de Drosophile, indiquent que la majoritĂ© des ARN ont une localisation subcellulaire asymĂ©trique, suggĂ©rant l’existence d’un « code de localisation » complexe. Cependant, ceci peut reprĂ©senter un exemple exceptionnel et la question demeurait, jusqu’ici, si une prĂ©valence comparable de localisation d’ARN est observable chez des cellules standards dĂ©veloppĂ©es en culture. De plus, des informations facilement disponibles Ă  propos des caractĂ©ristiques de distribution topologique d’instances de motifs Ă  travers des transcriptomes complets Ă©taient jusqu’à prĂ©sent manquantes. Afin d’avoir un aperçu de l’étendue et des propriĂ©tĂ©s impliquĂ©es dans la localisation des ARN, nous avons soumis des cellules de Drosophile (D17) et de l’humain (HepG2) Ă  un fractionnement biochimique afin d’isoler les fractions nuclĂ©aire, cytosolique, membranaire et insoluble. Nous avons ensuite sĂ©quencĂ© en profondeur l’ARN extrait et analysĂ© par spectromĂ©trie de masse les protĂ©ines extraites de ces fractions. Nous avons nommĂ© cette mĂ©thode CeFra-Seq. Par des analyses bio-informatiques, j’ai ensuite cartographiĂ© l’enrichissement de divers biotypes d’ARN (p. ex. ARN messager, ARN long non codant, ARN circulaire) et protĂ©ines au sein des fractions subcellulaires. Ceci a rĂ©vĂ©lĂ© que la distribution d’un large Ă©ventail d’espĂšces d’ARN codants et non codants est asymĂ©trique. Une analyse des gĂšnes orthologues entre mouche et humain a aussi dĂ©montrĂ© de fortes similitudes, suggĂ©rant que le processus de localisation est Ă©volutivement conservĂ©. De plus, j’ai observĂ© des attributs (p. ex. la taille des transcrits) distincts parmi les populations d’ARN messagers spĂ©cifiques Ă  une fraction. Finalement, j’ai observĂ© des corrĂ©lations et anti-corrĂ©lations spĂ©cifiques entre certains groupes d’ARN messagers et leurs protĂ©ines. Pour permettre l’étude de la topologie de motifs et de leurs conservations, j’ai crĂ©Ă© oRNAment, une base de donnĂ©es d’instances prĂ©sumĂ©e de sites de liaison de protĂ©ines chez des ARN codants et non codants. À partir de donnĂ©es de motifs de liaison protĂ©ique par RNAcompete et par RNA Bind-n-Seq, j’ai dĂ©veloppĂ© un algorithme permettant l’identification rapide d’instances potentielles de ces motifs dans un transcriptome complet. J’ai pu ainsi cataloguer les instances de 453 motifs provenant de 223 protĂ©ines liant l’ARN pour 525 718 transcrits chez cinq espĂšces. Les rĂ©sultats obtenus ont Ă©tĂ© validĂ©s en les comparant Ă  des donnĂ©es publiques de eCLIP. J’ai, par la suite, utilisĂ© oRNAment pour analyser en dĂ©tail les aspects topologiques des instances prĂ©sumĂ©es de ces motifs et leurs conservations Ă©volutives relatives. Ceci a permis de dĂ©montrer que la plupart des motifs sont distribuĂ©s de façon similaire entre espĂšces. De plus, j’ai discernĂ© des points communs entre les sous-groupes de protĂ©ines liant des biotypes distincts ou des rĂ©gions d’ARN spĂ©cifiques. La prĂ©sence de tels patrons, similaires ou non, entre espĂšces est susceptible de reflĂ©ter l’importance de leurs fonctions. D’ailleurs, l’analyse plus dĂ©taillĂ©e du positionnement d’un motif entre rĂ©gions transcriptomiques comparables chez les vertĂ©brĂ©s suggĂšre une conservation syntĂ©nique de ceux-ci, Ă  divers degrĂ©s, pour tous les biotypes d’ARN. La topologie rĂ©gionale de certaines instances de motifs rĂ©pĂ©tĂ©es apparaĂźt aussi comme Ă©volutivement conservĂ©e et peut ĂȘtre importante afin de permettre une liaison adĂ©quate de la protĂ©ine. Finalement, les rĂ©sultats compilĂ©s avec oRNAment ont permis de postuler sur un nouveau rĂŽle potentiel pour l’ARN long non codant HELLPAR comme Ă©ponge de protĂ©ines liant l’ARN. La caractĂ©risation systĂ©matique d’ARN localisĂ©s et de motifs de rĂ©gulation en cis prĂ©sentĂ©e dans cette thĂšse dĂ©montre comment l’intĂ©gration d’information Ă  l’échelle transcriptomique permet d’évaluer la prĂ©valence de l’asymĂ©trie, les caractĂ©ristiques distinctes et la conservation Ă©volutive de collections d’ARN.The subcellular localization of RNA allows a rapid and spatially restricted deployment of protein and noncoding RNA activities. The trafficking of RNA is directed by sequence elements (primary subsequences, secondary structures), also called regulatory motifs, present in cis within the RNA molecule. These motifs are recognized by RNA-binding proteins that mediate the transport of transcripts to specific sites in the cell. Recent studies in the Drosophila embryo indicate that the majority of RNAs display an asymmetric subcellular localization, suggesting the existence of a complex "localization code". However, this may represent an exceptional example and the question remained, until now, whether a comparable prevalence of RNA localization is observable in standard cells grown in culture. In addition, readily available information about the topological distribution of pattern instances across full transcriptomes has been hitherto lacking. In order to have a broad overview of the extent and properties involved in RNA localization, we subjected Drosophila (D17) and human (HepG2) cells to biochemical fractionation to isolate the nuclear, cytosolic, membrane and insoluble fractions. We then performed deep sequencing on the extracted RNA and analyzed through mass spectrometry the proteins extracted from these fractions. We named this method CeFra-Seq. Through bioinformatics analyses, I then profiled the enrichment of various RNA biotypes (e.g. messenger RNA, long noncoding RNA, circular RNA) and proteins within the subcellular fractions. This revealed the high prevalence of asymmetric distribution of both coding and noncoding RNA species. An analysis of orthologous genes between fly and human has also shown strong similarities, suggesting that the localization process is evolutionarily conserved. In addition, I have observed distinct attributes (e.g. transcript size) among fraction-specific messenger RNA populations. Finally, I observed specific correlations and anti-correlations between defined groups of messenger RNAs and the proteins they encode. To study motifs topology and their conservation, I created oRNAment, a database of putative RNA-binding protein binding sites instances in coding and noncoding RNAs. Using data from protein binding motifs assessed by RNAcompete and by RNA Bind-n-Seq experiments, I have developed an algorithm allowing their rapid identification in a complete transcriptome. I was able to catalog the instances of 453 motifs from 223 RNA-binding proteins for 525,718 transcripts in five species. The results obtained were validated by comparing them with public data from eCLIP. I then used oRNAment to further analyze the topological aspects of these motifs’ instances and their relative evolutionary conservation. This showed that most motifs are distributed in a similar fashion between species. In addition, I have detected commonalities between the subgroups of proteins linking preferentially distinct biotypes or specific RNA regions. The presence or absence of such pattern between species is likely a reflection of the importance of their functions. Moreover, a more precise analysis of the position of a motif among comparable transcriptomic regions in vertebrates suggests a syntenic conservation, to varying degrees, in all RNA biotypes. The regional topology of certain motifs as repeated instances also appears to be evolutionarily conserved and may be important in order to allow adequate binding of the protein. Finally, the results compiled with oRNAment allowed to postulate on a potential new role for the long noncoding RNA HELLPAR as an RNA-binding protein sponge. The systematic characterization of RNA localization and cis regulatory motifs presented in this thesis demonstrates how the integration of information at a transcriptomic scale enables the assessment of the prevalence of asymmetry, the distinct characteristics and the evolutionary conservation of RNA clusters

    Analyse de la corrĂ©lation conditionnelle dĂ©rivĂ©e de la coĂ©volution d’un systĂšme de trois gĂšnes par un modĂšle du maximum de vraisemblance

    Get PDF
    Les gĂšnes codant pour des protĂ©ines peuvent souvent ĂȘtre regroupĂ©s et intĂ©grĂ©s en modules fonctionnels par rapport Ă  un organelle. Ces modules peuvent avoir des composantes qui suivent une Ă©volution corrĂ©lĂ©e pouvant ĂȘtre conditionnelle Ă  un phĂ©notype donnĂ©. Les gĂšnes liĂ©s Ă  la motilitĂ© possĂšdent cette caractĂ©ristique, car ils se suivent en cascade en rĂ©ponse Ă  des stimuli extĂ©rieurs. L’hyperthermophilie, d’autre part, est interreliĂ©e Ă  la reverse gyrase, cependant aucun autre Ă©lĂ©ment qui pourrait y ĂȘtre associĂ© avec certitude n’est connu. Ceci peut ĂȘtre dĂ» Ă  un dĂ©placement de gĂšnes non orthologues encore non rĂ©solu. En utilisant une approche bio-informatique, une modĂ©lisation mathĂ©matique d’évolution conditionnelle corrĂ©lĂ©e pour trois gĂšnes a Ă©tĂ© dĂ©veloppĂ©e et appliquĂ©e sur des profils phylĂ©tiques d’archaea. Ceci a permis d’établir des thĂ©ories quant Ă  la fonction potentielle du gĂšne du flagelle FlaD/E ainsi que l’histoire Ă©volutive des gĂšnes lui Ă©tant liĂ©s et ayant contribuĂ© Ă  sa formation. De plus, une histoire Ă©volutive thĂ©orique a Ă©tĂ© Ă©tablie pour une ligase liĂ©e Ă  l’hyperthermophilie.Protein coding gene may often be grouped and integrated in functional modules with respect to an organelle. These modules may have constituents that follow a conditional correlated evolution to a given phenotype. Genes linked to motility posses this characteristic as they follow a cascade in response to external stimuli. Similarly, hyperthermophily is related to reverse gyrase, however no other element that could be associated with certainty is known. This may be caused by an unresolved case of non-orthologous gene displacement. Using a bioinformatic approach, a mathematical model for conditional correlated evolution for three genes has been developed and applied to the phyletic profiles of archaea. This has helped to develop theories about the potential functions of the flagellar gene FlaD/E and the evolutionary history of the genes that are linked to it and that may have contributed to its formation. In addition, a theoretical evolutionary history has been established for a ligase associated with hyperthermophily

    Data for the generation of RNA spatiotemporal distributions and interpretation of Chk1 and SLBP protein depletion phenotypes during Drosophila embryogenesis

    No full text
    The data presented in this article is related to the research article entitled “Biochemical Fractionation of Time-Resolved Drosophila Embryos Reveals Similar Transcriptomic Alterations in Replication Checkpoint and Histone mRNA Processing Mutants” (Lefebvre et al., 2017) [1]. This article provides a spatiotemporal transcriptomic analysis of early embryogenesis and shows that mutations in the checkpoint factor grapes/Chk1 and the histone mRNA processing factor SLBP selectively impair zygotic gene expression. Here, lists of transcripts enriched in early syncytial embryos, late blastoderm embryos, cytoplasmic and nuclear extracts of blastoderm embryos are made public, along with transcription factor motif occurrence for genes enriched in each context. In addition, extensive lists of genes down-regulated upon Chk1 and SLBP protein depletion in embryos are released to enable further analyses

    RBP Image Database: A resource for the systematic characterization of the subcellular distribution properties of human RNA binding proteins

    No full text
    Abstract RNA binding proteins (RBPs) are central regulators of gene expression implicated in all facets of RNA metabolism. As such, they play key roles in cellular physiology and disease etiology. Since different steps of post-transcriptional gene expression tend to occur in specific regions of the cell, including nuclear or cytoplasmic locations, defining the subcellular distribution properties of RBPs is an important step in assessing their potential functions. Here, we present the RBP Image Database, a resource that details the subcellular localization features of 301 RBPs in the human HepG2 and HeLa cell lines, based on the results of systematic immuno-fluorescence studies conducted using a highly validated collection of RBP antibodies and a panel of 12 markers for specific organelles and subcellular structures. The unique features of the RBP Image Database include: (i) hosting of comprehensive representative images for each RBP-marker pair, with ∌250,000 microscopy images; (ii) a manually curated controlled vocabulary of annotation terms detailing the localization features of each factor; and (iii) a user-friendly interface allowing the rapid querying of the data by target or annotation. The RBP Image Database is freely available at https://rnabiology.ircm.qc.ca/RBPImage/.</jats:p

    Perspectives on ENCODE

    No full text
    The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.11Nsciescopu

    Expanded encyclopaedias of DNA elements in the human and mouse genomes

    No full text
    AbstractThe human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.11Nsciescopu
    corecore