101 research outputs found

    When needles look like hay: How to find tissue-specific enhancers in model organism genomes

    Get PDF
    AbstractA major prerequisite for the investigation of tissue-specific processes is the identification of cis-regulatory elements. No generally applicable technique is available to distinguish them from any other type of genomic non-coding sequence. Therefore, researchers often have to identify these elements by elaborate in vivo screens, testing individual regions until the right one is found.Here, based on many examples from the literature, we summarize how functional enhancers have been isolated from other elements in the genome and how they have been characterized in transgenic animals. Covering computational and experimental studies, we provide an overview of the global properties of cis-regulatory elements, like their specific interactions with promoters and target gene distances. We describe conserved non-coding elements (CNEs) and their internal structure, nucleotide composition, binding site clustering and overlap, with a special focus on developmental enhancers. Conflicting data and unresolved questions on the nature of these elements are highlighted. Our comprehensive overview of the experimental shortcuts that have been found in the different model organism communities and the new field of high-throughput assays should help during the preparation phase of a screen for enhancers. The review is accompanied by a list of general guidelines for such a project

    HNRNPA1 promotes recognition of splice site decoys by U2AF2 in vivo

    Get PDF
    Alternative pre-mRNA splicing plays a major role in expanding the transcript output of human genes. This process is regulated, in part, by the interplay of trans-acting RNA binding proteins (RBPs) with myriad cis-regulatory elements scattered throughout pre-mRNAs. These molecular recognition events are critical for defining the protein-coding sequences (exons) within pre-mRNAs and directing spliceosome assembly on noncoding regions (introns). One of the earliest events in this process is recognition of the 3' splice site (3'ss) by U2 small nuclear RNA auxiliary factor 2 (U2AF2). Splicing regulators, such as the heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1), influence spliceosome assembly both in vitro and in vivo, but their mechanisms of action remain poorly described on a global scale. HNRNPA1 also promotes proofreading of 3'ss sequences though a direct interaction with the U2AF heterodimer. To determine how HNRNPA1 regulates U2AF-RNA interactions in vivo, we analyzed U2AF2 RNA binding specificity using individual-nucleotide resolution crosslinking immunoprecipitation (iCLIP) in control and HNRNPA1 overexpression cells. We observed changes in the distribution of U2AF2 crosslinking sites relative to the 3'ss of alternative cassette exons but not constitutive exons upon HNRNPA1 overexpression. A subset of these events shows a concomitant increase of U2AF2 crosslinking at distal intronic regions, suggesting a shift of U2AF2 to "decoy" binding sites. Of the many noncanonical U2AF2 binding sites, Alu-derived RNA sequences represented one of the most abundant classes of HNRNPA1-dependent decoys. We propose that one way HNRNPA1 regulates exon definition is to modulate the interaction of U2AF2 with decoy or bona fide 3'ss

    AVADA improves automated genetic variant database construction directly from full-text literature

    Get PDF
    Purpose: The primary literature on human genetic diseases includes descriptions of pathogenic variants that are essential for clinical diagnosis. Variant databases such as ClinVar and HGMD collect pathogenic variants by manual curation. We aimed to automatically construct a freely accessible database of pathogenic variants directly from full-text articles about genetic disease. Methods: AVADA (Automatically curated VAriant DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic variants and genes in full text of primary literature and converts them to genomic coordinates for rapid downstream use. Results: AVADA automatically curated almost 60% of pathogenic variants deposited in HGMD, a 4.4-fold improvement over the current state of the art in automated variant extraction. AVADA also contains more than 60,000 pathogenic variants that are in HGMD, but not in ClinVar. In a cohort of 245 diagnosed patients, AVADA correctly annotated 38 previously described diagnostic variants, compared to 43 using HGMD, 20 using ClinVar and only 13 (wholly subsumed by AVADA and ClinVar's) using the best automated abstracts-only based approach. Conclusion: AVADA is the first machine learning tool that automatically curates a variants database directly from full text literature. AVADA is available upon publication at http://bejerano.stanford.edu/AVADA

    Characterization of the neural stem cell gene regulatory network identifies OLIG2 as a multifunctional regulator of self-renewal

    Get PDF
    The gene regulatory network (GRN) that supports neural stem cell (NS cell) self-renewal has so far been poorly characterized. Knowledge of the central transcription factors (TFs), the noncoding gene regulatory regions that they bind to, and the genes whose expression they modulate will be crucial in unlocking the full therapeutic potential of these cells. Here, we use DNase-seq in combination with analysis of histone modifications to identify multiple classes of epigenetically and functionally distinct cis-regulatory elements (CREs). Through motif analysis and ChIP-seq, we identify several of the crucial TF regulators of NS cells. At the core of the network are TFs of the basic helix-loop-helix (bHLH), nuclear factor I (NFI), SOX, and FOX families, with CREs often densely bound by several of these different TFs. We use machine learning to highlight several crucial regulatory features of the network that underpin NS cell self-renewal and multipotency. We validate our predictions by functional analysis of the bHLH TF OLIG2. This TF makes an important contribution to NS cell self-renewal by concurrently activating pro-proliferation genes and preventing the untimely activation of genes promoting neuronal differentiation and stem cell quiescence.Welcome Trust grants: (WT095908, WT098051), FEBS Long-Term Fellowship, Medical Research Council Grant-in-Aid (U117570528)

    AMELIE speeds Mendelian diagnosis by matching patient phenotype and genotype to primary literature

    Get PDF
    The diagnosis of Mendelian disorders requires labor-intensive literature research. Trained clinicians can spend hours looking for the right publication(s) supporting a single gene that best explains a patient’s disease. AMELIE (Automatic Mendelian Literature Evaluation) greatly accelerates this process. AMELIE parses all 29 million PubMed abstracts and downloads and further parses hundreds of thousands of full-text articles in search of information supporting the causality and associated phenotypes of most published genetic variants. AMELIE then prioritizes patient candidate variants for their likelihood of explaining any patient’s given set of phenotypes. Diagnosis of singleton patients (without relatives’ exomes) is the most time-consuming scenario, and AMELIE ranked the causative gene at the very top for 66% of 215 diagnosed singleton Mendelian patients from the Deciphering Developmental Disorders project. Evaluating only the top 11 AMELIE-scored genes of 127 (median) candidate genes per patient resulted in a rapid diagnosis in more than 90% of cases. AMELIE-based evaluation of all cases was 3 to 19 times more efficient than hand-curated database–based approaches. We replicated these results on a retrospective cohort of clinical cases from Stanford Children’s Health and the Manton Center for Orphan Disease Research. An analysis web portal with our most recent update, programmatic interface, and code is available at AMELIE.stanford.edu
    • …
    corecore