48 research outputs found

    Computational Analysis of the Post-Transcriptional Gene Regulatory Network.

    Full text link
    Across eukaryotic organisms, specific and coordinated interactions between protein-coding mRNAs, small regulatory RNAs, and a growing collection of RNA-binding proteins (RBPs) have emerged as major components orchestrating post-transcriptional gene regulation (PTGR). High-throughput sequencing technologies have dramatically accelerated our ability to probe the vast number of RNA:RNA and RBP:RNA interactions, which are the molecular drivers of PTGR. These efforts have generated an unprecedented quantity of data, placing higher demands on computational analyses to inform biological mechanisms and define underlying rules of PTGR. In this dissertation, I apply well-established computational tools and develop novel bioinformatic approaches to mine deep sequencing datasets to achieve the following aims. 1) I elucidate the biogenesis mechanism and downstream targets of the conserved piRNA class of small RNAs, which are required for fertility in Caenorhabditis elegans and higher metazoans. I define sex-specific piRNA subclasses that target unique sets of genes required for germline development. 2) I characterize the global dynamics of RBP:RNA interactions in the budding yeast Saccharomyces cerevisiae. I reveal that RBP binding explains over 40% of conservation at 3' untranslated regions, and I uncover pervasive binding of RBPs to not only single-stranded RNAs but also double-stranded RNAs, supporting a novel paradigm of RBPs targeting highly structured RNAs. Over one-third of RBP:RNA interactions are significantly altered under two environmental stress conditions, suggesting that PTGR is highly responsive to stress adaptation. 3) I identify RNA targets and propose biological mechanisms of PTGR for the conserved Pumilio family of RBPs in yeast. For example, I discovered a dual-regulatory mode of binding for Puf3p and Puf4p that is linked to both sequence and structure motifs. 4) Finally, I propose PTGR mechanisms for PUF-9- and microRNA-mediated co-regulation of developmental timing in C. elegans, and how LARP1 binding to translation machinery-encoding genes regulates mTOR signaling in human cell lines. Dysregulation of small RNA pathways and RBP-mediated processes has emerged as an important determinant in human disorders including cancer and neuromuscular disorders. Therefore, characterization of fundamental mechanisms of PTGR promises to enrich our understanding of the complex interactions governing eukaryotic gene expression and offers insights into the development of targeted therapies.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111339/1/mafree_1.pd

    Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae

    Full text link
    Abstract Background Protein-RNA interactions are integral components of nearly every aspect of biology, including regulation of gene expression, assembly of cellular architectures, and pathogenesis of human diseases. However, studies in the past few decades have only uncovered a small fraction of the vast landscape of the protein-RNA interactome in any organism, and even less is known about the dynamics of protein-RNA interactions under changing developmental and environmental conditions. Results Here, we describe the gPAR-CLIP (global photoactivatable-ribonucleoside-enhanced crosslinking and immunopurification) approach for capturing regions of the untranslated, polyadenylated transcriptome bound by RNA-binding proteins (RBPs) in budding yeast. We report over 13,000 RBP crosslinking sites in untranslated regions (UTRs) covering 72% of protein-coding transcripts encoded in the genome, confirming 3' UTRs as major sites for RBP interaction. Comparative genomic analyses reveal that RBP crosslinking sites are highly conserved, and RNA folding predictions indicate that secondary structural elements are constrained by protein binding and may serve as generalizable modes of RNA recognition. Finally, 38% of 3' UTR crosslinking sites show changes in RBP occupancy upon glucose or nitrogen deprivation, with major impacts on metabolic pathways as well as mitochondrial and ribosomal gene expression. Conclusions Our study offers an unprecedented view of the pervasiveness and dynamics of protein-RNA interactions in vivo.http://deepblue.lib.umich.edu/bitstream/2027.42/112318/1/13059_2012_Article_3050.pd

    The Caenorhabditis elegans HEN1 Ortholog, HENN-1, Methylates and Stabilizes Select Subclasses of Germline Small RNAs

    Get PDF
    Small RNAs regulate diverse biological processes by directing effector proteins called Argonautes to silence complementary mRNAs. Maturation of some classes of small RNAs involves terminal 2′-O-methylation to prevent degradation. This modification is catalyzed by members of the conserved HEN1 RNA methyltransferase family. In animals, Piwi-interacting RNAs (piRNAs) and some endogenous and exogenous small interfering RNAs (siRNAs) are methylated, whereas microRNAs are not. However, the mechanisms that determine animal HEN1 substrate specificity have yet to be fully resolved. In Caenorhabditis elegans, a HEN1 ortholog has not been studied, but there is evidence for methylation of piRNAs and some endogenous siRNAs. Here, we report that the worm HEN1 ortholog, HENN-1 (HEN of Nematode), is required for methylation of C. elegans small RNAs. Our results indicate that piRNAs are universally methylated by HENN-1. In contrast, 26G RNAs, a class of primary endogenous siRNAs, are methylated in female germline and embryo, but not in male germline. Intriguingly, the methylation pattern of 26G RNAs correlates with the expression of distinct male and female germline Argonautes. Moreover, loss of the female germline Argonaute results in loss of 26G RNA methylation altogether. These findings support a model wherein methylation status of a metazoan small RNA is dictated by the Argonaute to which it binds. Loss of henn-1 results in phenotypes that reflect destabilization of substrate small RNAs: dysregulation of target mRNAs, impaired fertility, and enhanced somatic RNAi. Additionally, the henn-1 mutant shows a weakened response to RNAi knockdown of germline genes, suggesting that HENN-1 may also function in canonical RNAi. Together, our results indicate a broad role for HENN-1 in both endogenous and exogenous gene silencing pathways and provide further insight into the mechanisms of HEN1 substrate discrimination and the diversity within the Argonaute family

    A unified data infrastructure to support large-scale rare disease research

    Get PDF
    The Solve-RD project brings together clinicians, scientists, and patient representatives from 51 institutes spanning 15 countries to collaborate on genetically diagnosing ("solving") rare diseases (RDs). The project aims to significantly increase the diagnostic success rate by co-analysing data from thousands of RD cases, including phenotypes, pedigrees, exome/genome sequencing and multi-omics data. Here we report on the data infrastructure devised and created to support this co-analysis. This infrastructure enables users to store, find, connect, and analyse data and metadata in a collaborative manner. Pseudonymised phenotypic and raw experimental data are submitted to the RD-Connect Genome-Phenome Analysis Platform and processed through standardised pipelines. Resulting files and novel produced omics data are sent to the European Genome-phenome Archive, which adds unique file identifiers and provides long-term storage and controlled access services. MOLGENIS "RD3" and Cafe Variome "Discovery Nexus" connect data and metadata and offer discovery services, and secure cloud-based "Sandboxes" support multi-party data analysis. This proven infrastructure design provides a blueprint for other projects that need to analyse large amounts of heterogeneous data.3. Good health and well-bein

    GA4GH: International policies and standards for data sharing across genomic research and healthcare.

    Get PDF
    The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits

    Community-Driven Data Analysis Training for Biology

    Get PDF
    The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org. We developed an infrastructure that facilitates data analysis training in life sciences. It is an interactive learning platform tuned for current types of data and research problems. Importantly, it provides a means for community-wide content creation and maintenance and, finally, enables trainers and trainees to use the tutorials in a variety of situations, such as those where reliable Internet access is unavailable

    Twist exome capture allows for lower average sequence coverage in clinical exome sequencing

    Get PDF
    Background Exome and genome sequencing are the predominant techniques in the diagnosis and research of genetic disorders. Sufficient, uniform and reproducible/consistent sequence coverage is a main determinant for the sensitivity to detect single-nucleotide (SNVs) and copy number variants (CNVs). Here we compared the ability to obtain comprehensive exome coverage for recent exome capture kits and genome sequencing techniques. Results We compared three different widely used enrichment kits (Agilent SureSelect Human All Exon V5, Agilent SureSelect Human All Exon V7 and Twist Bioscience) as well as short-read and long-read WGS. We show that the Twist exome capture significantly improves complete coverage and coverage uniformity across coding regions compared to other exome capture kits. Twist performance is comparable to that of both short- and long-read whole genome sequencing. Additionally, we show that even at a reduced average coverage of 70× there is only minimal loss in sensitivity for SNV and CNV detection. Conclusion We conclude that exome sequencing with Twist represents a significant improvement and could be performed at lower sequence coverage compared to other exome capture techniques

    A Solve-RD ClinVar-based reanalysis of 1522 index cases from ERN-ITHACA reveals common pitfalls and misinterpretations in exome sequencing

    Get PDF
    Purpose Within the Solve-RD project (https://solve-rd.eu/), the European Reference Network for Intellectual disability, TeleHealth, Autism and Congenital Anomalies aimed to investigate whether a reanalysis of exomes from unsolved cases based on ClinVar annotations could establish additional diagnoses. We present the results of the “ClinVar low-hanging fruit” reanalysis, reasons for the failure of previous analyses, and lessons learned. Methods Data from the first 3576 exomes (1522 probands and 2054 relatives) collected from European Reference Network for Intellectual disability, TeleHealth, Autism and Congenital Anomalies was reanalyzed by the Solve-RD consortium by evaluating for the presence of single-nucleotide variant, and small insertions and deletions already reported as (likely) pathogenic in ClinVar. Variants were filtered according to frequency, genotype, and mode of inheritance and reinterpreted. Results We identified causal variants in 59 cases (3.9%), 50 of them also raised by other approaches and 9 leading to new diagnoses, highlighting interpretation challenges: variants in genes not known to be involved in human disease at the time of the first analysis, misleading genotypes, or variants undetected by local pipelines (variants in off-target regions, low quality filters, low allelic balance, or high frequency). Conclusion The “ClinVar low-hanging fruit” analysis represents an effective, fast, and easy approach to recover causal variants from exome sequencing data, herewith contributing to the reduction of the diagnostic deadlock

    Probabilistic modeling of protein:RNA interaction data identifies functional Transcript States

    No full text
    <p>CSHL Genome Informatics 2015 poster</p> <p>UV-induced crosslinking and immunopurification of an RNA-binding protein (RBP) followed by deep sequencing of its bound RNAs (CLIP-seq and derivative protocols) is an increasingly popular method for identifying in vivo transcriptome-wide sites of RBP interactions at nucleotide resolution. Consequently, a large collection of published deep-sequencing datasets is available representing precise RNA interaction sites for hundreds of RBPs. Initial analyses of RBP:RNA interaction sites for individual RBPs have revealed important mechanistic insights into RBP-mediated post-transcriptional gene regulation. However, comprehensive integration of interaction data for multiple RBPs is lacking, resulting in an underappreciation of the importance of RBP:RNA interactions in the context of other factors.</p> <p>Inspired by the identification chromatin states (e.g., promoters, enhancers) from ChIP-seq data of histone modifications, transcription factor binding, and RNA Pol II occupancy, we have identified Transcript States across the Saccharomyces cerevisiae transcriptome. First, we obtained empirical evidence of direct interactions between RNAs and over 80 yeast RBPs (represented by over 140 CLIP-seq experiments). Next, we transformed aligned read count data into a binarized matrix representing presence or absence of each RBP across the transcriptome. Finally, we built and trained a probabilistic hidden Markov model to learn Transcript States from the binarized data.</p> <p>Preliminary results revealed Transcript States associated with functional regulatory elements such as intron 5’ splice sites, 3’ splice sites, and branch points. Importantly, our approach can easily relearn Transcript States as additional CLIP-seq datasets become available. Additionally, we can apply our methods to learn Transcript States from CLIP-seq data for any organism.</p
    corecore