unknown

Identifying Patterns of Cancer Disease Mechanisms by Mining Alternative Representations of Genomic Alterations

Abstract

Cancer is a complex disease driven by somatic genomic alterations (SGAs) that perturb signaling pathways and consequently cellular function. Identifying combinatorial patterns of pathway perturbations would provide insights into common disease mechanisms shared among tumors, which is important for guiding treatment and predicting outcome. However, identifying perturbed pathways is challenging, because different tumors can have the same perturbed pathways that are perturbed by different SGAs. We started off by designing a novel semantic representation that captures the functional similarity of distinct SGAs perturbing a common pathway in different tumors. This representation was used alongside the nested hierarchical Dirichlet process topic model in order to identify combinatorial patterns in altered signaling pathways. We found that the topic model was able to capture the functional relationships between topics. It was also able to identify cancer subtypes composed of tumors from different tissues of origin that exhibit different survival rates. These results led us to investigate the performance of the methodology on pan-cancer data, as well as in conjunction with cancer driver data. The results revealed that the framework was still able to identify clinically relevant features in pan-cancer. However, the addition of driver data decreased the noise in the data and improved the separation of tumors in the clustering results. This provided support for including the use of driver data in our methodology. In order to have gene representations independent of literature, we developed a biological representation that could identify functionally related genes. Its performance when used alongside topic modeling was tested. We found that the topic association patterns separated tumors by their tissue of origin. But, analyzing some of the cancer types on an individual basis still led to significant differences in survival. Our studies show the potential for using alternative representations in conjunction with topic modeling to investigate complex genomic diseases. With further research and refinement of this methodology, it has the potential to capture the relationship between pathways involved in cancer. This would contribute to a better understanding of cancer disease mechanisms and treatment

    Similar works