6 research outputs found
Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework
abstract: Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.The final version of this article, as published in Nucleic Acids Research, can be viewed online at: https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkx01
cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes
abstract: It remains challenging to predict regulatory variants in particular tissues or cell types due to highly context-specific gene regulation. By connecting large-scale epigenomic profiles to expression quantitative trait loci (eQTLs) in a wide range of human tissues/cell types, we identify critical chromatin features that predict variant regulatory potential. We present cepip, a joint likelihood framework, for estimating a variant’s regulatory probability in a context-dependent manner. Our method exhibits significant GWAS signal enrichment and is superior to existing cell type-specific methods. Furthermore, using phenotypically relevant epigenomes to weight the GWAS single-nucleotide polymorphisms, we improve the statistical power of the gene-based association test.The electronic version of this article is the complete one and can be found online at: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1177-
Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework
published_or_final_versio
Translational software infrastructure for medical genetics
Diep in de kern van onze cellen zetelt het desoxyribonucleïnezuur (DNA) molecuul die bekend staat als het genoom.DNA codeert de informatie die het leven laat groeien, overleven, diversifiëren en evolueren.Helaas kunnen dezelfde mechanismes die ons laten aanpassen aan een veranderende omgeving ook genetische aandoeningen veroorzaken.Hoewel we in staat zijn een aantal van deze aandoeningen op te sporen door moderne technologische vorderingen, moet er nog veel ontdekt en begrepen worden.Dit proefschrift draagt software infrastructuur aan om de moleculaire oorzaak van genetische aandoeningen te onderzoeken, laat zien hoe nieuwe bevindingen vertaald worden van fundamenteel onderzoek naar nieuwe software voor genoom diagnostiek, en introduceert een raamwerk voor genetische analyses die de automatisering en validatie van nieuwe software ondersteunt voor toepassing in de patientenzorg.Eerst ontwikkelen we datamodellen en software die helpt te bepalen welke gebieden op het genoom verantwoordelijk zijn voor ziektes en andere fysieke kenmerken.Vervolgens trekken we deze principes door naar modelorganismen.Door moleculaire gelijkenissen te gebruiken, ontdekken we nieuwe manieren om nematodes in te zetten voor onderzoek naar menselijke ziektes.Daarnaast kunnen we onze kennis van het genoom en de evolutie gebruiken om te voorspellen hoe pathogeen nieuwe mutaties zijn.Het resultaat is een publieke website waar DNA snel en accuraat gescand kan worden op mogelijk ziekteverwekkende mutaties.Tenslotte presenteren we een compleet systeem voor geautomatiseerde DNA analyse, inclusief een protocol specifiek voor genoom diagnostiek om overzichtelijke patient rapportages te produceren voor medisch experts waarmee een diagnose sneller en makkelijker gesteld kan worden.Deep inside the core of our cells resides the deoxyribonucleic acid (DNA) molecule known as the genome.DNA encodes the information that allows life to grow, survive, diversify and evolve.Unfortunately, the same mechanisms that let us adapt to a changing environment can also cause genetic disorders.While we are able to diagnose a number of these disorders using modern technological advancements, much remains to be discovered and understood.This thesis presents software infrastructure for investigating the molecular etiology of genetic disease using data from model organisms, demonstrates how to translate findings from fundamental research into new software tools for genome diagnostics, and introduces a downstream genome analysis framework that assists the automation and validation of the latest tools for applied patient care.We first develop data models and software to help determine which region of the genome is responsible for diseases and other physical traits.We then extend these principles towards model organisms.By using molecular similarities, we discover new ways to use nematodes for research into human diseases.Additionally, we can use our knowledge of the genome and evolution to predict how pathogenic new mutations are.The result is a public website where DNA can be scanned quickly and accurately for probable pathogenic mutations.Finally, we present a complete system for automated DNA analysis, including a protocol specific for genome diagnostics to produce clear patient reports for medical experts with which a diagnosis is made faster and easier
Recommended from our members
Volatile organic compounds associated with neonectria ditissima infection in apples (Malus pumila cv Gala)
Postharvest diseases in apples during long term storage result in loss and waste. This is mainly caused by fungal pathogens. Fungal contamination and rot can change some of the volatile organic compounds (VOC) emitted by apple fruits. In this study, disease free Gala apples were inoculated with Neonectria ditissima. The aim was to identify VOCs associated with N. ditissima infection in gala apples. The inoculated apples were placed in 5L glass flask, sealed, and incubated at 20oC for one hour after which a charcoal filtered airflow of 1 L/min was maintained for one hour through the Volatile Capture Trap (VCT) with volatile emissions captured on a porapak-Q absorbent filter. Captured volatiles were eluted using 1 mL of dichloromethane (DCM) into a standard Agilent 1.5 mL HPLC vial. Eluted volatiles were analysed using Gas Chromatography coupled with Mass Spectrometry (GC/MS). Volatiles were capture in three replicates for both inoculated and healthy control groups at 2 days, 8 days, 14 days, 21 days, 28 days, 35 days, and 42 days post-inoculation. The N. ditissima discriminatory volatile were identified/discriminated qualitatively based on the unique volatile compounds detected and quantitatively based on variation in peak area of certain combinations of volatile compounds. Some of the discriminatory volatiles such as dodecyl hexanoate, 9-decen-1-yl hexanoate, hexyl butanoate and pentyl acetate were detected in the early stages of the infection. Styrene, terpinene-4-ol, ethyl hexanoate, ethyl butanoate, ethyl pentanoate and 2-methylpentyl formate constituted the main VOCs emitted during apple fruit decay. Other compounds such as alpha Farnesene and hexyl acetate were common to both healthy and inoculated apples but the peak areas in the healthy apples were well above the peak areas in the inoculated apples. However, these compounds expressed a decline in peak area over time. Apples are stored commercially in sealed stores for months making visual observation for early detection of disease almost impossible. Disease of stored apples are most times only detected at advanced stages when it has become nearly impossible to prevent losses. These discriminatory volatile metabolites detected at early stages of infection are important for early non-visual detection of N. ditissima in stored apples. Further research is recommended to the use of these compounds in early detection of the disease caused by N. ditissima