43 research outputs found

    RNA-KG: An ontology-based knowledge graph for representing interactions involving RNA molecules

    Full text link
    The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to the patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are continuously produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph encompassing biological knowledge about RNAs gathered from more than 50 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the "RNA world". RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied

    Computational analyses of small silencing RNAs

    Get PDF
    High-throughput sequencing is a powerful tool to study diverse aspects of biology and applies to genome, transcriptome, and small RNA profiling. Ever increasing sequencing throughput and more specialized sequencing assays demand more sophisticated bioinformatics approaches. In this thesis, I present 4 studies for which I developed computational methods to handle high-throughput sequencing data to gain insights into biology. The first study describes the genome of High Five (Hi5) cells, originally derived from Trichoplusia ni eggs. The chromosome-level assembly (scaffold N50 = 14.2 Mb) contains 14,037 predicted protein-coding genes. Examination and curation of multiple gene families, pathways, and small RNA-producing loci reveal species- and order-specific features. The availability of the genome sequence, together with genome editing and single-cell cloning protocols, enables Hi5 cells as a new tool for studying small RNAs. The second study focuses on just one type of piRNAs that are produced at the pachytene stage of mammalian spermatogenesis. Despite their abundance, pachytene piRNAs are poorly understood. I find that pachytene piRNAs cleave transcripts of protein-coding genes and further target transcripts from other pachytene piRNA loci. Subsequently, systematic investigation of piRNA targeting by integrating different types of sequencing data uncovers the piRNA targeting rule. The third study describes computational procedures to map splicing branchpoints using high-throughput sequencing data. Screening >1.2 trillion RNA-seq reads determines >140,000 BPs for both human and mouse. Such branchpoints are compiled into BPDB (BranchPoint DataBase) to provide a comprehensive branchpoint catalog. The final study combines novel experimental and computational procedures to handle PCR duplicates that are prevalent in high-throughput sequencing data. Incorporation of unique molecular identifiers (UMIs) to tag each read enables unambiguous identification of PCR duplicates. Both simulated and experimental datasets demonstrate that UMI incorporation increases the reproducibility of RNA-seq and small RNA-seq. Surveying 7 common variables in high-throughput sequencing reveals that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine the PCR duplicate frequency. Finally, I show that removing PCR duplicates without UMIs leads to substantial bias into data analysis.2020-12-11T00:00:00

    snoDB: An interconnected online database of human snoRNA

    Get PDF
    L’ARN est bien plus qu’une molécule transitoire entre l’ADN et les protéines. Au-delà des ARN encodant des protéines, on trouve un vaste éventail d’ARN non-codants qui demeurent encore sous-étudiés. Ces ARN ont été découverts dans les années 1960, mais ce n’est qu’au tournant du siècle que leur incroyable prévalence en cellule a pu être confirmée avec la venue de méthodes de séquençage d’ARN à haut débit. Les expériences à haut débit ont également augmenté de façon exponentielle la quantité de données sur l’ARN créant un besoin pour des outils bio-informatiques permettant leur analyse et leur stockage. Un des premiers, et des plus abondant, type d’ARN non-codant à être découvert sont les petit ARN nucléolaires (snoRNA). Canoniquement caractérisés comme guides de modifications spécifiques dans l’ARN ribosomal, ces petits ARN hautement conservés ont maintenant une liste variée de fonctions non-canoniques, notamment au niveau de l’expression génique, ainsi qu’un nombre croissant d’associations à une panoplie de maladies et de cancer. Considérant la littérature grandissante sur les snoRNA chez l’humain, ainsi que leur connexion maintenant apparente à plusieurs domaines de recherche variés, un regroupement accessible de ce large spectre d’information est maintenant indispensable. Malheureusement, les bases de données en ligne de snoRNA humain, snoRNABase, snOPY, et snoRNA Atlas, ne sont plus à jour ou sont trop pointues au niveau de leurs données. De plus, elles figurent peu ou pas de données d’interactions non-canonique et/ou d’expression. Nous avons donc créé snoDB : une base de données interactive de snoRNA humain qui contient des données sur leurs fonctions non-canoniques, trouvées à travers la littérature, des données d’expression dans une panoplie de tissus, et bien plus. Contrairement à ces prédécesseurs, snoDB offre une visualisions sélectives de son plus large éventail de données, au sein d’une table interactive aux options de recherche abondantes. Les données d’expression peuvent également être visualisées dans la même page, sous forme de carte de chaleur, grâce à l’application sœur de snoDB : snoTHAW. snoDB se démarque aussi par sa connectivité à plus d’une douzaine de ressources incluant le consortium RNAcentral, la plus grande base de données d’ARN non-codant, dont snoDB fais maintenant parti. Les données de ces ressources ont été acquises puis jointe ensemble dans une base de données relationnel postgreSQL. De plus, elles sont toutes en lien dans la table de snoDB afin de facilement pouvoir corroborer l’information visible, ainsi qu’accéder aux fonctionnalités des autres sites. Enfin, snoDB a été construit pour être facile à mettre à jour afin d’assurer ces contributions à la recherche pour de nombreuses années.Abstract: RNA is more than just a transitory molecule between DNA and proteins. Beyond the scope of protein-coding RNAs lies a vast underexplored landscape of non-coding RNAs (ncRNA). These RNAs have been slowly uncovered since the 1960s but it took until the turn of the century, and the advent of high-throughput RNA-Sequencing methodologies, for us to finally see how dominated by ncRNAs the transcriptome really is. High-throughput experiments also exponentially expanded the amount of data on RNA and created a need for bioinformatics tools for their analysis and storage. One of the first, and most abundant, ncRNA types to be discovered was small nucleolar RNAs (snoRNAs). Canonically pegged as guides for the modification of pre-ribosomal RNAs, these highly conserved RNAs now boast a diverse list of crucial non-canonical roles, notably in gene expression, as well as being associated to a myriad of diseases and cancers. Considering the growing body of literature surrounding snoRNAs in humans, and their increasing connections to a broad range of fields of study, having an accessible and comprehensive assessment of these data has become essential. Unfortunately, existing online human snoRNA databases, snoRNABase, snOPY, and snoRNA Atlas, are either outdated or too narrow in scope, focusing almost exclusively on canonical snoRNA interactions and lacking expression data. As such, we have created snoDB: a modern, interactive database of human snoRNAs with curated data on non-canonical snoRNA interactions, expression data in a growing range of tissues and cell lines, and more. Unlike the old snoRNA databases, snoDB features extensive visualisation and filtering capabilities, allowing for its larger array of data to be selectively viewed in an interactive and customizable table. Expression data can be further visualised in interactive heatmaps thanks to snoDB’s sister tool: snoTHAW. snoDB also innovates by being much more interconnected with other resources. Data was gathered, and joined together in a relational postgreSQL database, from over a dozen resources, including the RNAcentral database consortium, the largest database of ncRNA sequences, of which snoDB is now a part of. In addition, all resources are linked to in-table, where data they provided appears, to help corroborate the data shown for transparency, as well as to grant access to interesting features housed on remote sites. Finally, snoDB is built to be easily maintainable, updatable and extensible to keep up with ongoing developments and insure that the information it contains will contribute to snoRNA research for years to come

    Putting the Pieces Together: Exons and piRNAs: A Dissertation

    Get PDF
    Analysis of gene expression has undergone a technological revolution. What was impossible 6 years ago is now routine. High-throughput DNA sequencing machines capable of generating hundreds of millions of reads allow, indeed force, a major revision toward the study of the genome’s functional output—the transcriptome. This thesis examines the history of DNA sequencing, measurement of gene expression by sequencing, isoform complexity driven by alternative splicing and mammalian piRNA precursor biogenesis. Examination of these topics is framed around development of a novel RNA-templated DNA-DNA ligation assay (SeqZip) that allows for efficient analysis of abundant, complex, and functional long RNAs. The discussion focuses on the future of transcriptome analysis, development and applications of SeqZip, and challenges presented to biomedical researchers by extremely large and rich datasets

    Disease Associated Mutations and Functional Variants that Significantly Disrupt RNA Structure

    Get PDF
    Genome-Wide Association Studies (GWAS) have revealed a great deal of trait and diseaseassociated Single Nucleotide Polymorphisms (SNPs) that fall in noncoding or intergenic regions of the human genome. This is congruent with the current understanding that many of these regions are actively transcribed, and that many transcripts and transcript regions that do not code for protein have important roles in the cell. In carrying out many transcripts’ functions, RNA structure plays a critical role. We hypothesized that a subset of noncoding disease associated SNPs significantly change RNA structure. We developed a program called SNPfold to identify SNPs that cause significant RNA structural rearrangement and utilized it on a set of 514 disease-associated SNPs in 350 unique noncoding regions of the human transcriptome. We identified six disease-states (Hyperferritinemia Cataract Syndrome, β- Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease, and Hypertension) where multiple SNPs significantly alter RNA structural ensembles. We then conducted Selective 2’ OH Acylation and Primer Extension (SHAPE) in order to confirm predicted structure change caused by SNPs associated with Hyperferritinemia Catraract Syndrome (U22G and A56U in the FTL 5’ UTR). Both mutations are shown to disrupt the formation of an Iron Response Element stemloop that is critical to translational regulation of the mRNA. We identified compensatory mutations that were able to restore these mutant structures to that of wildtype FTL 5’ UTR. We then identified from human haplotype data several regions where SNP pairs inherited together conserve structure. Lastly, we explored the functional effect of common SNPs associated with change in RNA expression level by calculating the enrichment of their overlap with experimentally derived binding sites for 14 different RNA-binding proteins. Consistent with a subset of these SNPs altering structure in functionally important sites of mRNA transcripts, we identified several proteins where SNPs are enriched for proximal overlap. These results in their entirety indicate that both rare disease-associated and common SNPs that significantly change RNA structure are present in human populations, and that such a functional effect may account for a subset of phenotypic differences and complex disease propensities among individuals.Doctor of Philosoph

    ExoPRIME technology for exosomal miRNA analysis and identification of oxidative DNA damage-induced miRNA regulatory network in human astrocytes

    Get PDF
    The high lipid content of the brain, coupled with its heavy oxygen dependence and relatively weak antioxidant system, makes it highly susceptible to oxidative DNA damage that contributes to neurodegeneration. This study assesses and compares the neurotoxic effects of proton and photon radiation on mitochondrial function and DNA repair capabilities of human astrocytes. Human astrocytes received either proton (0.5 Gy and 3 Gy), photon (0.5 Gy and 3 Gy), or sham-radiation treatment. The mRNA expression level of the human base-excision repair protein, 8-deoxyguanosine DNA glycosylase 1 (hOGG1) was determined via RT-qPCR. Radiation-induced changes in mitochondrial mass and oxidative activity were assessed using fluorescent imaging with MitoTracker™ Green FM and MitoTracker™ Orange CM-H2TMRos dyes, respectively. A significant increase in mitochondrial mass and levels of reactive oxygen species was observed after radiation treatment. This was accompanied by a decreased OGG1 mRNA expression. These results are indicative of a radiation-induced dose-dependent decrease in mitochondrial function, an increase in senescence and astrogliosis, and impairment of the DNA repair capabilities in healthy glial cells. Photon irradiation was associated with a more significant disruption in mitochondrial function and base-excision repair mechanisms in vitro in comparison to the same dose of proton treatment. This study further identifies specific ROS-responsive miRNAs that modulate the expression and activity of the DNA repair proteins in human astrocytes, which could lead to the development of targeted therapeutic strategies for neurological diseases. Oxidative DNA damage was established after treatment of human astrocytes with 10 μM sodium dichromate for 16 hours. Comet assay analysis indicated a significant increase in oxidized guanine lesions. PCR analysis confirmed that sodium dichromate reduced the mRNA expression levels of hOGG1. Small RNAseq was performed on an Ion Torrent™ system and the differentially expressed miRNAs were identified using Partek Flow® software. The biologically significant miRNAs were selected using miRNet 2.0. Oxidative-stressinduced DNA damage was associated with a significant decrease in miRNA expression: 231 downregulated miRNAs and 2 upregulated miRNAs (p \u3c 0.05; \u3e 2-fold). In addition to identifying multiple miRNA-mRNA pairs involved in DNA repair processes, this study uncovered two novel miRNA-mRNA pairs interactions: miR-1248:OGG1 and miR-103a- OGG1. Inhibition of miR-1248 and miR-103a via the transfection of their inhibitors restored the increased expression levels of hOGG1. Therefore, targeting the identified microRNAs could ameliorate the nuclear DNA damage caused by exposure to mutagens. The miRNA candidates identified in this study could serve as potential biomarkers and therapeutics for oxidative stress in the brain to reduce the incidence and improve the treatment of cancer and neurodegenerative disorders. In a parallel but closely related study, we report a direct, one-step exosome sampling technology, for selective capture of CD63+ exosome subpopulations using an immune-affinity protocol. The ExoPRIME microprobe provides a Precise Rapid Inexpensive Mild (non-invasive) and Efficient (i.e. PRIME) alternative to the conventional polymer precipitation-based methods by enriching a comparatively more homogenous exosome population. The tool consists of an inert Serin™ stainless steelz microneedle (300 μm in diameter × 30 mm in height), pre-coated with a thin-film polyelectrolyte layer that serves as a substrate for covalent bonding of biotin. An anti-CD63 steptavidin-conjugated antibody that selectively binds to the corresponding tetraspanin embedded in the lipid bilayer of exosomes was immobilized to the outer surface of the probe. The feasibility of the ExoPRIME technology was validated using two types of biological samples: conditioned astrocyte medium (CAM) and astrocyte-derived exosome suspension (EXO). The study investigated the impact of the temperature (4°C and 22°C) and incubation duration (2h and 16h) on the capture efficiency of the ExoPRIME tool. A fluorescence-based enzymatic assay for exosome quantification was used to assess the probe’s exosomes capture efficiency and the reproducibility of the technology. The low level of non-specific binding initially observed in non-functionalized microneedles was drastically minimized by blocking the ExoPRIME probe with 0.1% BSA. The ExoPRIME microprobe captured exponentially more exosomes than the non-functionalized microneedle that indicates enrichment of CD63-expressing exosomes. A major advantage provided by the ExoPRIME technology over existing platforms is its applicability over a broad dynamic range of temperature and incubation parameters without compromising the purity and viability of exosomal cargoes. The loading capacity of the probe increased after incubation for 16 h at 40C in exosome suspension (24Å~106 exosomes per probe) while the efficiency decreased 10 folds after 2 h at 40C (24Å~105 exosomes per probe). The increase in temperature had an impact on the stability of the reagents that contributed to a 2-fold efficiency reduction after incubation in exosome suspension for 16 h at 220C (12Å~106 exosomes per probe). However, the 2-hour roomtemperature incubation (2 h at 220C) of the ExoPRIME probe yielded an increased capture efficiency (12Å~106 exosomes per probe) when compared to the 2 h at 4°C incubation (24Å~105 exosomes per probe). These results suggest that lower temperatures with extended incubation times constitute the most optimal parameters that ensure high probe loading capacity. Another advantage of the ExoPRIME microprobe is that it captures antigen-specific subpopulation of exosomes directly from conditioned astrocyte medium (CAM), eliminating the requirements for additional filtration and pre-concentration, and thereby cutting down costs and handling time. Besides the relatively reduced number of enriched exosomes, the CAM results are consistent with the trend obtained for EXO incubations, a phenomenon that could be attributed to the presence of various extracellular proteins and cellular debris, which could mask antibodies and compete physically with exosomes for binding. The capabilities to integrate different incubation times, temperatures, and biofluid type thus present exosome researchers with the flexibility to choose the combined parameters that best suit their purpose, the desired factor in clinical and laboratory applications. The developed tool requires very low amounts of antibody, permits the use and reuse of minimal sample volumes (≤ 200 μL), can be multiplexed in arrays to diagnostically profile multiple exosome classes and is amenable to integration into a lab-on-a-chip platform to achieve parallel, high-throughput isolation in a [semi]-automated workstation. Moreover, this platform could provide direct exosomal analysis of biological fluids since it can elegantly interface with existing picomolar-range nucleic acid assays to provide a clinical diagnostic tool at the point of care and facilitate fundamental studies in exosomes functions

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
    corecore