176 research outputs found

    Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association

    Get PDF
    Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease

    STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

    Get PDF
    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately 2and510hourstoprocessafullexomesequenceand2 and 5–10 hours to process a full exome sequence and 30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2

    Quantifying supercoiling-induced denaturation bubbles in DNA

    Get PDF
    In both eukaryotic and prokaryotic DNA sequences of 30-100 base-pairs rich in AT base-pairs have been identified at which the double helix preferentially unwinds. Such DNA unwinding elements are commonly associated with origins for DNA replication and transcription, and with chromosomal matrix attachment regions. Here we present a quantitative study of local DNA unwinding based on extensive single DNA plasmid imaging. We demonstrate that long-lived single-stranded denaturation bubbles exist in negatively supercoiled DNA, at the expense of partial twist release. Remarkably, we observe a linear relation between the degree of supercoiling and the bubble size, in excellent agreement with statistical modelling. Furthermore, we obtain the full distribution of bubble sizes and the opening probabilities at varying salt and temperature conditions. The results presented herein underline the important role of denaturation bubbles in negatively supercoiled DNA for biological processes such as transcription and replication initiation in vivo

    SAIGE-GENE plus improves the efficiency and accuracy of set-based rare variant association tests

    Get PDF
    Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.Peer reviewe

    Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides

    Get PDF
    Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.Peer reviewe

    Evaluating drug targets through human loss-of-function genetic variation

    Get PDF
    Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous 'knockout' humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development.Peer reviewe

    Transcript expression-aware annotation improves rare variant interpretation

    Get PDF
    The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)(1), we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the 'proportion expressed across transcripts', which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project(2) and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.Peer reviewe

    A structural variation reference for medical and population genetics

    Get PDF
    Structural variants (SVs) rearrange large segments of DNA(1) and can have profound consequences in evolution and human disease(2,3). As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)(4) have become integral in the interpretation of single-nucleotide variants (SNVs)(5). However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage(6). We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings(7). This SV resource is freely distributed via the gnomAD browser(8) and will have broad utility in population genetics, disease-association studies, and diagnostic screening.Peer reviewe

    Структурно-семантичний аналіз еврісемантів української мови (на матеріалі лексико-семантичного поля "річ")

    Get PDF
    В статье рассматриваются лексико-семантические особенности эврисемантов в украинском языке, осуществляется их семантическая классификация, методом компонентного анализа проводится структурный анализ. Представлен фрагмент иерархично упорядоченной парадигмы широкозначных имен существительных, состоящий из ЛСГ "Предмет" и "Дело".У статті розглядаються лексико-семантичні особливості еврісемантів української мови, здійснюється їх семантична класифікація, за допомогою компонентного аналізу проводиться структурний аналіз. Подається фрагмент ієрархічно впорядкованої парадигми широкозначних іменників, представлений ЛСГ "Предмет" та "Справа".In this article lexica-semantic peculiarities of everysemantical nouns in Ukrainian are considered. It was made semantic distinguishing and structural analysis of those elements. The everysemants of a lexica-semantic field "Thing", represented by two groups "Subject" and "Work", are disposed in specific hierarchy

    The mutational constraint spectrum quantified from variation in 141,456 humans

    Get PDF
    Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes(1). Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.Peer reviewe
    corecore