6 research outputs found

    Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework

    Get PDF
    abstract: Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.The final version of this article, as published in Nucleic Acids Research, can be viewed online at: https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkx01

    cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes

    Get PDF
    abstract: It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant’s regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.The electronic version of this article is the complete one and can be found online at: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1177-

    Translational software infrastructure for medical genetics

    Get PDF
    Diep in de kern van onze cellen zetelt het desoxyribonucleïnezuur (DNA) molecuul die bekend staat als het genoom.DNA codeert de informatie die het leven laat groeien, overleven, diversifiëren en evolueren.Helaas kunnen dezelfde mechanismes die ons laten aanpassen aan een veranderende omgeving ook genetische aandoeningen veroorzaken.Hoewel we in staat zijn een aantal van deze aandoeningen op te sporen door moderne technologische vorderingen, moet er nog veel ontdekt en begrepen worden.Dit proefschrift draagt software infrastructuur aan om de moleculaire oorzaak van genetische aandoeningen te onderzoeken, laat zien hoe nieuwe bevindingen vertaald worden van fundamenteel onderzoek naar nieuwe software voor genoom diagnostiek, en introduceert een raamwerk voor genetische analyses die de automatisering en validatie van nieuwe software ondersteunt voor toepassing in de patientenzorg.Eerst ontwikkelen we datamodellen en software die helpt te bepalen welke gebieden op het genoom verantwoordelijk zijn voor ziektes en andere fysieke kenmerken.Vervolgens trekken we deze principes door naar modelorganismen.Door moleculaire gelijkenissen te gebruiken, ontdekken we nieuwe manieren om nematodes in te zetten voor onderzoek naar menselijke ziektes.Daarnaast kunnen we onze kennis van het genoom en de evolutie gebruiken om te voorspellen hoe pathogeen nieuwe mutaties zijn.Het resultaat is een publieke website waar DNA snel en accuraat gescand kan worden op mogelijk ziekteverwekkende mutaties.Tenslotte presenteren we een compleet systeem voor geautomatiseerde DNA analyse, inclusief een protocol specifiek voor genoom diagnostiek om overzichtelijke patient rapportages te produceren voor medisch experts waarmee een diagnose sneller en makkelijker gesteld kan worden.Deep inside the core of our cells resides the deoxyribonucleic acid (DNA) molecule known as the genome.DNA encodes the information that allows life to grow, survive, diversify and evolve.Unfortunately, the same mechanisms that let us adapt to a changing environment can also cause genetic disorders.While we are able to diagnose a number of these disorders using modern technological advancements, much remains to be discovered and understood.This thesis presents software infrastructure for investigating the molecular etiology of genetic disease using data from model organisms, demonstrates how to translate findings from fundamental research into new software tools for genome diagnostics, and introduces a downstream genome analysis framework that assists the automation and validation of the latest tools for applied patient care.We first develop data models and software to help determine which region of the genome is responsible for diseases and other physical traits.We then extend these principles towards model organisms.By using molecular similarities, we discover new ways to use nematodes for research into human diseases.Additionally, we can use our knowledge of the genome and evolution to predict how pathogenic new mutations are.The result is a public website where DNA can be scanned quickly and accurately for probable pathogenic mutations.Finally, we present a complete system for automated DNA analysis, including a protocol specific for genome diagnostics to produce clear patient reports for medical experts with which a diagnosis is made faster and easier
    corecore