14 research outputs found

    One for all and all for One: Improving replication of genetic studies through network diffusion

    Get PDF
    <div><p>Improving accuracy in genetic studies would greatly accelerate understanding the genetic basis of complex diseases. One approach to achieve such an improvement for risk variants identified by the genome wide association study (GWAS) approach is to incorporate previously known biology when screening variants across the genome. We developed a simple approach for improving the prioritization of candidate disease genes that incorporates a network diffusion of scores from known disease genes using a protein network and a novel integration with GWAS risk scores, and tested this approach on a large Alzheimer disease (AD) GWAS dataset. Using a statistical bootstrap approach, we cross-validated the method and for the first time showed that a network approach improves the expected replication rates in GWAS studies. Several novel AD genes were predicted including <i>CR2</i>, <i>SHARPIN</i>, <i>and PTPN2</i>. Our re-prioritized results are enriched for established known AD-associated biological pathways including inflammation, immune response, and metabolism, whereas standard non-prioritized results were not. Our findings support a strategy of considering network information when investigating genetic risk factors.</p></div

    Support vector machine training to predict GWAS and network Z-score weights.

    No full text
    <p>Selection of genes with a high replication rate (> 0.7, blue points) and low replication rate (<0.1, red points) yielded a balanced number of genes in each replication class (high/low). A linear SVM model was trained to predict replication class using the GWAS and network Z-scores of each gene. Genes represented as X's were used as support vectors for the training of the SVM, whereas genes represented as O's were not. Both network and GWAS Z-scores contributed to the decision boundary, as demonstrated by the significance of their predicted coefficients using logistic regression (GWAS: p <2.0×10<sup>−16</sup>, Network: p = 0.0016).</p

    Proximity between RAD genes in PPI network.

    No full text
    <p>Each RAD gene was ranked (in comparison to the other 19,972 genes in the network) based upon its degree (number of interactions in network), its ASP distance to the RAD genes, and total diffusion distance from the RAD genes. The average ranking of the RAD genes was 7,949 using ASP (60th percentile, t-test p = 0.015) and 6,959 for diffusion (65th percentile, t-test p = 0.00054).</p

    Comparison of GWAS and network Z-scores.

    No full text
    <p><b>A.</b> Transformed Z-scores are uncorrelated. <b>B.</b> Genes with high network scores had higher replication rates compared to those with low network scores, as further visualized and confirmed statistically as shown in <b><a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1007306#pgen.1007306.g004" target="_blank">Fig 4</a></b>. Reprate = replication rate.</p

    Summary of analysis steps.

    No full text
    <p>A set of AD genes that are reproducible (RAD genes) across different genetic studies was assembled through literature curation. The RAD genes were assigned a high initial risk score, and graph theoretical diffusion was employed to derive network diffusion scores for the rest of the genes in the network. Scores obtained from genetic screens and network diffusion were integrated to derive a new prioritization.</p

    Filtering on network score improves replication rate.

    No full text
    <p>The replication rate was computed for all genes surpassing the significance threshold for each GWAS. This procedure was repeated in each bootstrapped dataset and the average replication rate was determined (purple). This process was repeated using increasingly strict filters on the network diffusion scores. The baseline replication rate without utilizing network scores (naïve method) is represented by the purple points. The strictest network filter (red) has a consistently higher replication rate than the naïve method.</p
    corecore