17 research outputs found

    Postgwas: advanced GWAS interpretation in R.

    Get PDF
    We present a comprehensive toolkit for post-processing, visualization and advanced analysis of GWAS results. In the spirit of comparable tools for gene-expression analysis, we attempt to unify and simplify several procedures that are essential for the interpretation of GWAS results. This includes the generation of advanced Manhattan and regional association plots including rare variant display as well as novel interaction network analysis tools for the investigation of systems-biology aspects. Our package supports virtually all model organisms and represents the first cohesive implementation of such tools for the popular language R. Previous software of that range is dispersed over a wide range of platforms and mostly not adaptable for custom work pipelines. We demonstrate the utility of this package by providing an example workflow on a publicly available dataset

    Diagram of functions, parameters and dependencies in the postgwas package.

    No full text
    <p>Individual functions are represented by white boxes divided into an upper part listing the function name and a lower part containing argument names and types. Arguments preceded by a ‘+’ sign are optional and contain default values. Dashed lines denote a ‘used by’ relation: For example, the superordinate function <i>postgwas</i> calls <i>removeNeighborSnps</i>, <i>gwas2network</i>, <i>snp2gene</i>, <i>manhattanplot</i> and <i>regionalplot</i>. Only functions that are exported from the package (documented and visible to the user) are shown. Non-segmented boxes denote variables from a special environment that are used by internal functions (indicated by solid connectors) and available to the user through publicly visible getter/setter functions.</p

    Manhattan and regional plots of random datasets.

    No full text
    <p>Part (A) shows a conventional manhattan plot as produced with the default options on an artificial GWAS dataset. Peak SNPs that exceed genomewide significance are colored in red and are annotated with the closest genes (covering genes are placed above for intragenic SNPs, up- and downstream gene left and right). A second threshold is set by default for suggestive association at P<1*10<sup>−5</sup> with gene annotations in blue. Annotation text can be deactivated or replaced with the identifiers of peak SNPs. Part (B) and (C) display regional plots with different unique capabilities. Both contain tracks showing the association p-value graph, genes with strand and exon information and a triangle LD plot where the color intensity reflects the r<sup>2</sup> correlation between SNPs. Identifiers of queried SNPs are automatically annotated to the pvalue graph but can take custom annotation text as well. The LD plot uses either custom genotype files or HapMap data and is available for arbitrary large regions. In (B), r<sup>2</sup> values have additionally been annotated to the LD triangles and we compare p-value graphs of two distinct datasets (color code listed in the legend). In (C), rare variant information from a resequencing study is included in a track at the bottom, showing allele frequencies in a histogram at the very bottom and identifier, position (original and remapped) and calibration lines for selected variants above. Only de novo variants are displayed here using filter settings on the histogram. A second filter has been set on the position information display to include only variants of certain predicted functional effects (determined by SnpEff). The color code for the variant effect is listed in the plot legend above.</p

    Interaction network analysis in postgwas.

    No full text
    <p>Part (A) shows the complete network of genes derived from the human height GWAS dataset <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0071775#pone.0071775-LangoAllen2" target="_blank">[38]</a> using a p-value cutoff at 1×10<sup>−6</sup>, generated by application of the <i>postgwas()</i> summary function to the dataset without further customization parameters. Appearance of the network can be modified by using a custom (drag and drop) vertex layout or deactivated edge labels. The edges of this network are formed based on common REACTOME pathway membership, (optionally) labeled by the type of interaction (here: shared pathway name) and weighted by the combined association strength of participating genes. Vertex sizes (and optionally transparency) correspond to the GWAS association p-value. Under consideration of these weights, application of a minimum cut-edge graph partitioning algorithm leads to a decomposition of the global graph into functional subunits with preferential accumulation of well-associated genes within modules. Part (B) shows the first extracted module, exhibiting the strongest evidence for accumulation of low GWAS p-values. This accumulation is reflected by a module score listed in the legend (right box, a lower score corresponds to higher evidence). The major biological functions are identified for each module by GO-term over-representation analysis. The top three over-represented terms are listed in a colorized legend together with the module score. Vertices within the module are colorized according to their membership in over-represented GO terms. Part (C) demonstrates a network analysis for multiple datasets. The module shown has been extracted from a network of GO term similarity between genes from two distinct synthetically generated GWAS datasets. Each dataset corresponds to a vertex shape (squares and circles). For genes occurring in both datasets, vertex shapes are plotted on top of each other in order of their p-value (e.g. SLC38A4). Their label is printed boldface and italic. When a single SNP is annotated to multiple genes (e.g. residing in a larger LD block), these genes are labeled with a cross as shown for three solute carrier genes residing in the same block and having similar molecular functions. Such modules need careful interpretation.</p

    An Efficient and Comprehensive Strategy for Genetic Diagnostics of Polycystic Kidney Disease

    No full text
    <div><p>Renal cysts are clinically and genetically heterogeneous conditions. Autosomal dominant polycystic kidney disease (ADPKD) is the most frequent life-threatening genetic disease and mainly caused by mutations in <i>PKD1</i>. The presence of six <i>PKD1</i> pseudogenes and tremendous allelic heterogeneity make molecular genetic testing challenging requiring laborious locus-specific amplification. Increasing evidence suggests a major role for <i>PKD1</i> in early and severe cases of ADPKD and some patients with a recessive form. Furthermore it is becoming obvious that clinical manifestations can be mimicked by mutations in a number of other genes with the necessity for broader genetic testing. We established and validated a sequence capture based NGS testing approach for all genes known for cystic and polycystic kidney disease including <i>PKD1</i>. Thereby, we demonstrate that the applied standard mapping algorithm specifically aligns reads to the <i>PKD1</i> locus and overcomes the complication of unspecific capture of pseudogenes. Employing careful and experienced assessment of NGS data, the method is shown to be very specific and equally sensitive as established methods. An additional advantage over conventional Sanger sequencing is the detection of copy number variations (CNVs). Sophisticated bioinformatic read simulation increased the high analytical depth of the validation study and further demonstrated the strength of the approach. We further raise some awareness of limitations and pitfalls of common NGS workflows when applied in complex regions like <i>PKD1</i> demonstrating that quality of NGS needs more than high coverage of the target region. By this, we propose a time- and cost-efficient diagnostic strategy for comprehensive molecular genetic testing of polycystic kidney disease which is highly automatable and will be of particular value when therapeutic options for PKD emerge and genetic testing is needed for larger numbers of patients.</p></div

    Mutations and variants identified in other genes for cystic and polycystic kidney disease.

    No full text
    <p>NGS data that demonstrate the power of the setup for parallel analysis of all genes known to date for cystic and polycystic kidney disease and related disorders in a single step.</p><p>* Classification from PKD mutation database (ADPKD Mutation Database (<a href="http://pkdb.mayo.edu/" target="_blank">http://pkdb.mayo.edu/</a>)</p><p>** classification taken from ARPKD/<i>PKHD1</i> database (<a href="http://www.humgen.rwth-aachen.de" target="_blank">http://www.humgen.rwth-aachen.de</a>); het—heterozygous; LH—likely hypomorphic; LP—likely pathogenic; HLP—highly likely pathogenic; DP—definitely pathogenic; PP—probably pathogenic; P—pathogenic.</p><p>Mutations and variants identified in other genes for cystic and polycystic kidney disease.</p
    corecore