342 research outputs found

    Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

    Get PDF
    As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.United States. National Institutes of Health (U54DK105566)United States. National Institutes of Health (R01GM104371

    Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association

    Get PDF
    Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease

    Структурно-семантичний аналіз еврісемантів української мови (на матеріалі лексико-семантичного поля "річ")

    Get PDF
    В статье рассматриваются лексико-семантические особенности эврисемантов в украинском языке, осуществляется их семантическая классификация, методом компонентного анализа проводится структурный анализ. Представлен фрагмент иерархично упорядоченной парадигмы широкозначных имен существительных, состоящий из ЛСГ "Предмет" и "Дело".У статті розглядаються лексико-семантичні особливості еврісемантів української мови, здійснюється їх семантична класифікація, за допомогою компонентного аналізу проводиться структурний аналіз. Подається фрагмент ієрархічно впорядкованої парадигми широкозначних іменників, представлений ЛСГ "Предмет" та "Справа".In this article lexica-semantic peculiarities of everysemantical nouns in Ukrainian are considered. It was made semantic distinguishing and structural analysis of those elements. The everysemants of a lexica-semantic field "Thing", represented by two groups "Subject" and "Work", are disposed in specific hierarchy

    STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

    Get PDF
    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately 2and510hourstoprocessafullexomesequenceand2 and 5–10 hours to process a full exome sequence and 30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2

    Quantifying supercoiling-induced denaturation bubbles in DNA

    Get PDF
    In both eukaryotic and prokaryotic DNA sequences of 30-100 base-pairs rich in AT base-pairs have been identified at which the double helix preferentially unwinds. Such DNA unwinding elements are commonly associated with origins for DNA replication and transcription, and with chromosomal matrix attachment regions. Here we present a quantitative study of local DNA unwinding based on extensive single DNA plasmid imaging. We demonstrate that long-lived single-stranded denaturation bubbles exist in negatively supercoiled DNA, at the expense of partial twist release. Remarkably, we observe a linear relation between the degree of supercoiling and the bubble size, in excellent agreement with statistical modelling. Furthermore, we obtain the full distribution of bubble sizes and the opening probabilities at varying salt and temperature conditions. The results presented herein underline the important role of denaturation bubbles in negatively supercoiled DNA for biological processes such as transcription and replication initiation in vivo

    Fast DEM collision checks on multicore nodes.

    Get PDF
    Many particle simulations today rely on spherical or analytical particle shape descriptions. They find non-spherical, triangulated particle models computationally infeasible due to expensive collision detections. We propose a hybrid collision detection algorithm based upon an iterative solve of a minimisation problem that automatically falls back to a brute-force comparison-based algorithm variant if the problem is ill-posed. Such a hybrid can exploit the vector facilities of modern chips and it is well-prepared for the arising manycore era. Our approach pushes the boundary where non-analytical particle shapes and the aligning of more accurate first principle physics become manageable

    SAIGE-GENE plus improves the efficiency and accuracy of set-based rare variant association tests

    Get PDF
    Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) SAIGE-GENE+ performs set-based rare variant association tests with improved type 1 error control and computational efficiency by collapsing ultra-rare variants and conducting multiple tests corresponding to different minor allele frequency cutoffs and annotations.Peer reviewe

    An experience report on (auto-)tuning of mesh-based PDE solvers on shared memory systems.

    Get PDF
    With the advent of manycore systems, shared memory parallelisation has gained importance in high performance computing. Once a code is decomposed into tasks or parallel regions, it becomes crucial to identify reasonable grain sizes, i.e. minimum problem sizes per task that make the algorithm expose a high concurrency at low overhead. Many papers do not detail what reasonable task sizes are, and consider their findings craftsmanship not worth discussion. We have implemented an autotuning algorithm, a machine learning approach, for a project developing a hyperbolic equation system solver. Autotuning here is important as the grid and task workload are multifaceted and change frequently during runtime. In this paper, we summarise our lessons learned. We infer tweaks and idioms for general autotuning algorithms and we clarify that such a approach does not free users completely from grain size awareness

    Base-specific mutational intolerance near splice sites clarifies the role of nonessential splice nucleotides

    Get PDF
    Variation in RNA splicing (i.e., alternative splicing) plays an important role in many diseases. Variants near 5' and 3' splice sites often affect splicing, but the effects of these variants on splicing and disease have not been fully characterized beyond the two "essential" splice nucleotides flanking each exon. Here we provide quantitative measurements of tolerance to mutational disruptions by position and reference allele-alternative allele combinations. We show that certain reference alleles are particularly sensitive to mutations, regardless of the alternative alleles into which they are mutated. Using public RNA-seq data, we demonstrate that individuals carrying such variants have significantly lower levels of the correctly spliced transcript, compared to individuals without them, and confirm that these specific substitutions are highly enriched for known Mendelian mutations. Our results propose a more refined definition of the "splice region" and offer a new way to prioritize and provide functional interpretation of variants identified in diagnostic sequencing and association studies.Peer reviewe
    corecore