Search CORE

134 research outputs found

VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date: 01/02/2014
Field of study

<div>A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.</div

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

Pathway enrichment map for the glioblastoma samples.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

For each sample, the top panel shows the pathway crosstalk map, and the bottom panel shows the genes contributing to the crosstalk. In the top panel, each node represents a pathway with the node color proportional to the pathway enrichment P value. The edge represents crosstalk event between the connected nodes (pathways), with edge width proportional to shared MutGenes and edge color proportional to the P value of the crosstalk event. In the bottom panel, a matrix shows the profile of genes in the significant pathways, with rows for MutGenes and columns for pathways. When a MutGene is observed in a pathway, the corresponding box is in red.</p

FigShare

Co-mutation pathway map for the lung adenocarcinomas samples.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

Node represents pathways that have been identified as significant in at least one sample. An edge between pathways indicates a significant co-mutation event, with edge width proportional to the number of occurring samples of the co-mutation event, and edge color representing the P values of the event. Darker edge indicates lower P values.</p

FigShare

Distribution of mutation genes (MutGenes) as a function of gene length (cDNA length).

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

(A) The proportion of MutGenes in lung adenocarcinoma (LUAD) samples versus gene length (cDNA length). The green line indicates all MutGenes in the 182 LUAD samples, and the red line indicates recurrent MutGenes, which occurred in ≥2 LUAD samples. (B) The proportion of MutGenes in melanoma samples versus gene length (cDNA length). The green line indicates all MutGenes in the 121 melanoma samples, and the red line indicates recurrent MutGenes, which occurred in ≥2 melanoma samples.</p

FigShare

Pathway enrichment test in the lung adenocarcinomas samples.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

Pathways are represented as rectangles and organized by samples. For each sample, the sample ID is presented on the left and the three rows on the right correspond to results from the weighted resampling method (top row), the regular resampling method (middle row), and hypergeometric test (bottom row), respectively. For each method, the pathways were placed from left to right according to their P values with lower P values on the left, and, when multiple pathways have the same P values, they were ordered by their KEGG ID. To visualize the comparison among methods, each pathway was assigned only one color proportional to its rank in the results from weighted resampling, with darker red implicating lower P values. Pathways that are identified by regular resampling or hypergeometric test but not by the weighted resampling are notated in white. Thus, the color of the pathway implicates its rank in the weighted resampling method, and the discordance in the other two rows for a sample shows the different ranking using the other two methods. Note that two samples with the largest number of significantly enriched pathways were not presented in this figure due to space limitations. They are the sample 16668 with 34 significant pathways and the sample 17210 with 22 significant pathways.</p

FigShare

Distribution of significant interaction frequency.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

(A) Distribution of the number of edges (in a logarithmic scale) versus their occurrence in LUAD. The vertical line at 14 indicates the threshold at which the edges drifted away from the linear distribution. The vertical line at 10 indicates the threshold used for edge selection after a manual adjustment based on known LUAD genes. (B) The x-axis shows the frequency of interactions that were identified in 182 LUAD samples. The y-axis shows the proportion of interactions with the corresponding frequency in the x-axis. The black bar indicates the frequency for all significant interactions in all samples, while the grey bar indicates the frequency for the significant interactions involving any of the 52 known LUAD in HPRD. (C) Distribution of the number of edges (in a logarithmic scale) versus their occurrence in melanoma. The vertical line at 10 indicates the threshold at which edges drift away from the linear distribution; this threshold is used as the cutoff to select edges in melanoma.</p

FigShare

Summary of the sample information and significant pathways (PBonferroni<0.05) for the lung adenocarcinomas samples.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

MutGenes: mutation genes.</p

FigShare

Selected subgraphs in the lung adenocarcinoma consensus mutation network.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

Node size is proportional to the number of samples harboring mutations in the corresponding gene (MutGene), as indicated in the parenthesis after the node name. The triangular nodes denote the proteins encoded by known LUAD genes (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003460#s4" target="_blank">Materials and Methods</a>). Edge width is proportional to the number of samples in which the interaction is detected, which is also indicated by the number on each edge. Figures in (A), (B), and (C) show three selected subgraphs in LUAD.</p

FigShare

Flowchart of VarWalker.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

The pipeline has four steps, with steps 1–3 implemented in each sample and step 4 implemented in the whole cohort. In step 1, the mutation genes for each sample (MutGenes, defined as those with ≥1 deleterious somatic mutation in coding regions) are first assessed to compute a probability weight vector (PWV) by fitting a generalized additive model. A weighted resampling test based on the PWV is then performed to build a null distribution in which genes occur at random. Genes with freq≥0.05 are filtered, unless they are CGC genes, resulting in a set of significant MutGenes for each sample. In step 2, Random Walk with Restart (RWR) is initiated for each of the significant MutGenes, and their top interactors are collected. In step 3, these interactors are evaluated in 100 random networks generated with the same topological structures and performed using the same RWR algorithm. Interactors that are not observed by random chance, i.e., pedge<0.05, are then denoted as significant interactors and retained. In step 4, all significant interactors and interactions from each sample are pooled together, and a consensus mutation network is constructed.</p

FigShare

Selected subgraphs in the melanoma consensus mutation network.

Author: Peilin Jia (36888)
Zhongming Zhao (40519)
Publication venue
Publication date
Field of study

Node size is proportional to the number of samples harboring mutations in the corresponding gene (MutGene), as indicated in the parenthesis after the node name. Edge width is proportional to the number of samples harboring the interactions, which is also indicated by the number on each edge. For example, NRAS was mutated in 29 samples, PIK3CA was mutated in 6 samples, and the interaction between their protein products was found by RWR in 29 samples. In contrast, RASGRP2 was mutated in one sample and the interaction between its protein product and NRAS was found by RWR in 30 samples.</p

FigShare