971 research outputs found

    Network-based method for inferring cancer progression at the pathway level from cross-sectional mutation data

    Get PDF
    Large-scale cancer genomics projects are providing a wealth of somatic mutation data from a large number of cancer patients. However, it is difficult to obtain several samples with a temporal order from one patient in evaluating the cancer progression. Therefore, one of the most challenging problems arising from the data is to infer the temporal order of mutations across many patients. To solve the problem efficiently, we present a Network-based method (NetInf) to Infer cancer progression at the pathway level from cross-sectional data across many patients, leveraging on the exclusive property of driver mutations within a pathway and the property of linear progression between pathways. To assess the robustness of NetInf, we apply it on simulated data with the addition of different levels of noise. To verify the performance of NetInf, we apply it to analyze somatic mutation data from three real cancer studies with large number of samples. Experimental results reveal that the pathways detected by NetInf show significant enrichment. Our method reduces computational complexity by constructing gene networks without assigning the number of pathways, which also provides new insights on the temporal order of somatic mutations at the pathway level rather than at the gene level

    DriveWays: a method for identifying possibly overlapping driver pathways in cancer

    Get PDF
    The majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay's output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.No sponso

    Network-guided data integration and gene prioritization

    Get PDF

    A Study Of Computational Problems In Computational Biology And Social Networks: Cancer Informatics And Cascade Modelling

    Get PDF
    It is undoubtedly that everything in this world is related and nothing independently exists. Entities interact together to form groups, resulting in many complex networks. Examples involve functional regulation models of proteins in biology, communities of people within social network. Since complex networks are ubiquitous in daily life, network learning had been gaining momentum in a variety of discipline like computer science, economics and biology. This call for new technique in exploring the structure as well as the interactions of network since it provides insight in understanding how the network works and deepening our knowledge of the subject in hand. For example, uncovering proteins modules helps us understand what causes lead to certain disease and how protein co-regulate each others. Therefore, my dissertation takes on problems in computational biology and social network: cancer informatics and cascade model-ling. In cancer informatics, identifying specific genes that cause cancer (driver genes) is crucial in cancer research. The more drivers identified, the more options to treat the cancer with a drug to act on that gene. However, identifying driver gene is not easy. Cancer cells are undergoing rapid mutation and are compromised in regards to the body\u27s normally DNA repair mechanisms. I employed Markov chain, Bayesian network and graphical model to identify cancer drivers. I utilize heterogeneous sources of information to discover cancer drivers and unlocking the mechanism behind cancer. Above all, I encode various pieces of biological information to form a multi-graph and trigger various Markov chains in it and rank the genes in the aftermath. We also leverage probabilistic mixed graphical model to learn the complex and uncertain relationships among various bio-medical data. On the other hand, diffusion of information over the network had drawn up great interest in research community. For example, epidemiologists observe that a person becomes ill but they can neither determine who infected the patient nor the infection rate of each individual. Therefore, it is critical to decipher the mechanism underlying the process since it validates efforts for preventing from virus infections. We come up with a new modeling to model cascade data in three different scenario

    Network enrichment analysis: extension of gene-set enrichment analysis to gene networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis.</p> <p>Results</p> <p>We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study.</p> <p>Conclusions</p> <p>The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.</p

    Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Assays of multiple tumor samples frequently reveal recurrent genomic aberrations, including point mutations and copy-number alterations, that affect individual genes. Analyses that extend beyond single genes are often restricted to examining pathways, interactions and functional modules that are already known.</p> <p>Methods</p> <p>We present a method that identifies functional modules without any information other than patterns of recurrent and mutually exclusive aberrations (RME patterns) that arise due to positive selection for key cancer phenotypes. Our algorithm efficiently constructs and searches networks of potential interactions and identifies significant modules (RME modules) by using the algorithmic significance test.</p> <p>Results</p> <p>We apply the method to the TCGA collection of 145 glioblastoma samples, resulting in extension of known pathways and discovery of new functional modules. The method predicts a role for <it>EP300 </it>that was previously unknown in glioblastoma. We demonstrate the clinical relevance of these results by validating that expression of <it>EP300 </it>is prognostic, predicting survival independent of age at diagnosis and tumor grade.</p> <p>Conclusions</p> <p>We have developed a sensitive, simple, and fast method for automatically detecting functional modules in tumors based solely on patterns of recurrent genomic aberration. Due to its ability to analyze very large amounts of diverse data, we expect it to be increasingly useful when applied to the many tumor panels scheduled to be assayed in the near future.</p

    High Accordance in Prognosis Prediction of Colorectal Cancer across Independent Datasets by Multi-Gene Module Expression Profiles

    Get PDF
    A considerable portion of patients with colorectal cancer have a high risk of disease recurrence after surgery. These patients can be identified by analyzing the expression profiles of signature genes in tumors. But there is no consensus on which genes should be used and the performance of specific set of signature genes varies greatly with different datasets, impeding their implementation in the routine clinical application. Instead of using individual genes, here we identified functional multi-gene modules with significant expression changes between recurrent and recurrence-free tumors, used them as the signatures for predicting colorectal cancer recurrence in multiple datasets that were collected independently and profiled on different microarray platforms. The multi-gene modules we identified have a significant enrichment of known genes and biological processes relevant to cancer development, including genes from the chemokine pathway. Most strikingly, they recruited a significant enrichment of somatic mutations found in colorectal cancer. These results confirmed the functional relevance of these modules for colorectal cancer development. Further, these functional modules from different datasets overlapped significantly. Finally, we demonstrated that, leveraging above information of these modules, our module based classifier avoided arbitrary fitting the classifier function and screening the signatures using the training data, and achieved more consistency in prognosis prediction across three independent datasets, which holds even using very small training sets of tumors
    • …
    corecore