7 research outputs found

    Identifying and ranking potential driver genes of Alzheimer\u27s disease using multiview evidence aggregation.

    Get PDF
    MOTIVATION: Late onset Alzheimer\u27s disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer\u27s. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer\u27s and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking

    SCNrank: spectral clustering for network-based ranking to reveal potential drug targets and its application in pancreatic ductal adenocarcinoma

    Get PDF
    Background: Pancreatic ductal adenocarcinoma (PDAC) is the most common pancreatic malignancy. Due to its wide heterogeneity, PDAC acts aggressively and responds poorly to most chemotherapies, causing an urgent need for the development of new therapeutic strategies. Cell lines have been used as the foundation for drug development and disease modeling. CRISPR-Cas9 plays a key role in every step-in drug discovery: from target identification and validation to preclinical cancer cell testing. Using cell-line models and CRISPR-Cas9 technology together make drug target prediction feasible. However, there is still a large gap between predicted results and actionable targets in real tumors. Biological network models provide great modus to mimic genetic interactions in real biological systems, which can benefit gene perturbation studies and potential target identification for treating PDAC. Nevertheless, building a network model that takes cell-line data and CRISPR-Cas9 data as input to accurately predict potential targets that will respond well on real tissue remains unsolved. Methods: We developed a novel algorithm 'Spectral Clustering for Network-based target Ranking' (SCNrank) that systematically integrates three types of data: expression profiles from tumor tissue, normal tissue and cell-line PDAC; protein-protein interaction network (PPI); and CRISPR-Cas9 data to prioritize potential drug targets for PDAC. The whole algorithm can be classified into three steps: 1. using STRING PPI network skeleton, SCNrank constructs tissue-specific networks with PDAC tumor and normal pancreas tissues from expression profiles; 2. With the same network skeleton, SCNrank constructs cell-line-specific networks using the cell-line PDAC expression profiles and CRISPR-Cas 9 data from pancreatic cancer cell-lines; 3. SCNrank applies a novel spectral clustering approach to reduce data dimension and generate gene clusters that carry common features from both networks. Finally, SCNrank applies a scoring scheme called 'Target Influence score' (TI), which estimates a given target's influence towards the cluster it belongs to, for scoring and ranking each drug target. Results: We applied SCNrank to analyze 263 expression profiles, CRPSPR-Cas9 data from 22 different pancreatic cancer cell-lines and the STRING protein-protein interaction (PPI) network. With SCNrank, we successfully constructed an integrated tissue PDAC network and an integrated cell-line PDAC network, both of which contain 4414 selected genes that are overexpressed in tumor tissue samples. After clustering, 4414 genes are distributed into 198 clusters, which include 367 targets of FDA approved drugs. These drug targets are all scored and ranked by their TI scores, which we defined to measure their influence towards the network. We validated top-ranked targets in three aspects: Firstly, mapping them onto the existing clinical drug targets of PDAC to measure the concordance. Secondly, we performed enrichment analysis to these drug targets and the clusters there are within, to reveal functional associations between clusters and PDAC; Thirdly, we performed survival analysis for the top-ranked targets to connect targets with clinical outcomes. Survival analysis reveals that overexpression of three top-ranked genes, PGK1, HMMR and POLE2, significantly increases the risk of death in PDAC patients. SCNrank is an unbiased algorithm that systematically integrates multiple types of omics data to do potential drug target selection and ranking. SCNrank shows great capability in predicting drug targets for PDAC. Pancreatic cancer-associated gene candidates predicted by our SCNrank approach have the potential to guide genetics-based anti-pancreatic drug discovery

    Identifying Network Perturbation in Cancer

    No full text

    Identifying Network Perturbation in Cancer

    No full text
    <div><p>We present a computational framework, called DISCERN (<b>DI</b>fferential <b>S</b>pars<b>E</b> <b>R</b>egulatory <b>N</b>etwork), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is <i>perturbed</i>—having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers <i>conditional dependencies</i> between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.</p></div

    Identifying Network Perturbation in Cancer - Fig 1

    No full text
    <p><b>(A) A simple hypothetical example that illustrates the perturbation of a network of 7 genes between disease and normal tissues.</b> One possible cause of the perturbation is a cancer driver mutation on gene ‘1’ that alters the interactions between gene ‘1’ and genes ‘3’, ‘4’, ‘5’, and ‘6’. (B) One possible cause of network perturbation. Gene ‘1’ is regulated by different sets of genes between cancer and normal conditions. (C) The overview of our approach. DISCERN takes two expression datasets as input: an expression dataset from patients with a disease of interest and another expression dataset from normal tissues (top). DISCERN computes the network perturbation score for each gene that estimates the difference in connection between the gene and other genes between disease and normal conditions (middle). We perform various post-analyses to evaluate the DISCERN method by comparing with alternative methods, based on the importance of the high-scoring genes in the disease through a survival analysis and on how well the identified perturbed genes explain the observed epigenomic activity data (bottom).</p

    Identifying Network Perturbation in Cancer - Fig 2

    No full text
    <p><b>(A) Average receiver operating characteristic (ROC) curves from the experiments on synthetic data.</b> We compare DISCERN with 7 alternative methods: 3 existing methods—LNS [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004888#pcbi.1004888.ref035" target="_blank">35</a>], D-score [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004888#pcbi.1004888.ref036" target="_blank">36</a>], and PLSNet [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004888#pcbi.1004888.ref034" target="_blank">34</a>]—and 4 methods we developed for comparison—pLNS, pD-score, <i>D</i><sup>0</sup> and p<i>D</i><sup>0</sup>. (B) Comparison of the runtime (hours) between PLSNet and DISCERN for varying numbers of variables (<i>p</i>). The triangles mean the measured run times over specific values of <i>p</i>, and lines connect these measured run times. PLSNet uses the empirical p-values from permutation tests as scores, and DISCERN does not. For a large value of <i>p</i>, DISCERN is two to three orders of magnitude faster than PLSNet.</p
    corecore