2,201 research outputs found
BeWith: A Between-Within Method to Discover Relationships between Cancer Modules via Integrated Analysis of Mutual Exclusivity, Co-occurrence and Functional Interactions
The analysis of the mutational landscape of cancer, including mutual
exclusivity and co-occurrence of mutations, has been instrumental in studying
the disease. We hypothesized that exploring the interplay between
co-occurrence, mutual exclusivity, and functional interactions between genes
will further improve our understanding of the disease and help to uncover new
relations between cancer driving genes and pathways. To this end, we designed a
general framework, BeWith, for identifying modules with different combinations
of mutation and interaction patterns. We focused on three different settings of
the BeWith schema: (i) BeME-WithFun in which the relations between modules are
enriched with mutual exclusivity while genes within each module are
functionally related; (ii) BeME-WithCo which combines mutual exclusivity
between modules with co-occurrence within modules; and (iii) BeCo-WithMEFun
which ensures co-occurrence between modules while the within module relations
combine mutual exclusivity and functional interactions. We formulated the
BeWith framework using Integer Linear Programming (ILP), enabling us to find
optimally scoring sets of modules. Our results demonstrate the utility of
BeWith in providing novel information about mutational patterns, driver genes,
and pathways. In particular, BeME-WithFun helped identify functionally coherent
modules that might be relevant for cancer progression. In addition to finding
previously well-known drivers, the identified modules pointed to the importance
of the interaction between NCOR and NCOA3 in breast cancer. Additionally, an
application of the BeME-WithCo setting revealed that gene groups differ with
respect to their vulnerability to different mutagenic processes, and helped us
to uncover pairs of genes with potentially synergetic effects, including a
potential synergy between mutations in TP53 and metastasis related DCC gene
QuaDMutEx: quadratic driver mutation explorer
Background
Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. Results
We propose QuaDMutEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. Conclusions
Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx is a tool that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods. QuaDMutEx is available for download from https://github.com/bokhariy/QuaDMutEx under the GNU GPLv3 license
DISCOVERING DRIVER MUTATIONS IN BIOLOGICAL DATA
Background
Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. Results
We propose QuaDMutEx an QuadMutNetEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming.
QuaDMutNetEx is our proposed method that combines protein-protein interaction networks to the method elements of QuaDMutEx. In particular, QuaDMutEx incorporates three novel elements: a non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty. In the new method, we incorporated a new quadratic rewarding term that prefers gene solution set that is connected with respect to protein-protein interaction networks. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. Conclusions
Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx and QuaDMutNetEx are tools that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods
Network-based identification of driver pathways in clonal systems
Highly ethanol-tolerant bacteria for the production of biofuels, bacterial pathogenes which are resistant to antibiotics and cancer cells are examples of phenotypes that are of importance to society and are currently being studied. In order to better understand these phenotypes and their underlying genotype-phenotype relationships it is now commonplace to investigate DNA and expression profiles using next generation sequencing (NGS) and microarray techniques. These techniques generate large amounts of omics data which result in lists of genes that have mutations or expression profiles which potentially contribute to the phenotype. These lists often include a multitude of genes and are troublesome to verify manually as performing literature studies and wet-lab experiments for a large number of genes is very time and resources consuming. Therefore, (computational) methods are required which can narrow these gene lists down by removing generally abundant false positives from these lists and can ideally provide additional information on the relationships between the selected genes.
Other high-throughput techniques such as yeast two-hybrid (Y2H), ChIP-Seq and Chip-Chip but also a myriad of small-scale experiments and predictive computational methods have generated a treasure of interactomics data over the last decade, most of which is now publicly available. By combining this data into a biological interaction network, which contains all molecular pathways that an organisms can utilize and thus is the equivalent of the blueprint of an organisms, it is possible to integrate the omics data obtained from experiments with these biological interaction networks. Biological interaction networks are key to the computational methods presented in this thesis as they enables methods to account for important relations between genes (and gene products). Doing so it is possible to not only identify interesting genes but also to uncover molecular processes important to the phenotype.
As the best way to analyze omics data from an interesting phenotype varies widely based on the experimental setup and the available data, multiple methods were developed and applied in the context of this thesis:
In a first approach, an existing method (PheNetic) was applied to a consortium of three bacterial species that together are able to efficiently degrade a herbicide but none of the species are able to efficiently degrade the herbicide on their own. For each of the species expression data (RNA-seq) was generated for the consortium and the species in isolation. PheNetic identified molecular pathways which were differentially expressed and likely contribute to a cross-feeding mechanism between the species in the consortium.
Having obtained proof-of-concept, PheNetic was adapted to cope with experimental evolution datasets in which, in addition to expression data, genomics data was also available. Two publicly available datasets were analyzed: Amikacin resistance in E. coli and coexisting ecotypes in E.coli. The results allowed to elicit well-known and newly found molecular pathways involved in these phenotypes.
Experimental evolution sometimes generates datasets consisting of mutator phenotypes which have high mutation rates. These datasets are hard to analyze due to the large amount of noise (most mutations have no effect on the phenotype). To this end IAMBEE was developed. IAMBEE is able to analyze genomic datasets from evolution experiments even if they contain mutator phenotypes. IAMBEE was tested using an E. coli evolution experiment in which cells were exposed to increasing concentrations of ethanol. The results were validated in the wet-lab.
In addition to methods for analysis of causal mutations and mechanisms in bacteria, a method for the identification of causal molecular pathways in cancer was developed. As bacteria and cancerous cells are both clonal, they can be treated similar in this context. The big differences are the amount of data available (many more samples are available in cancer) and the fact that cancer is a complex and heterogenic phenotype. Therefore we developed SSA-ME, which makes use of the concept that a causal molecular pathway has at most one mutation in a cancerous cell (mutual exclusivity). However, enforcing this criterion is computationally hard. SSA-ME is designed to cope with this problem and search for mutual exclusive patterns in relatively large datasets. SSA-ME was tested on cancer data from the TCGA PAN-cancer dataset. From the results we could, in addition to already known molecular pathways and mutated genes, predict the involvement of few rarely mutated genes.nrpages: 246status: publishe
DriveWays: a method for identifying possibly overlapping driver pathways in cancer
The majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay's output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.No sponso
Finding Mutated Subnetworks Associated with Survival in Cancer
Next-generation sequencing technologies allow the measurement of somatic
mutations in a large number of patients from the same cancer type. One of the
main goals in analyzing these mutations is the identification of mutations
associated with clinical parameters, such as survival time. This goal is
hindered by the genetic heterogeneity of mutations in cancer, due to the fact
that genes and mutations act in the context of pathways. To identify mutations
associated with survival time it is therefore crucial to study mutations in the
context of interaction networks.
In this work we study the problem of identifying subnetworks of a large
gene-gene interaction network that have mutations associated with survival. We
formally define the associated computational problem by using a score for
subnetworks based on the test statistic of the log-rank test, a widely used
statistical test for comparing the survival of two populations. We show that
the computational problem is NP-hard and we propose a novel algorithm, called
Network of Mutations Associated with Survival (NoMAS), to solve it. NoMAS is
based on the color-coding technique, that has been previously used in other
applications to find the highest scoring subnetwork with high probability when
the subnetwork score is additive. In our case the score is not additive;
nonetheless, we prove that under a reasonable model for mutations in cancer
NoMAS does identify the optimal solution with high probability. We test NoMAS
on simulated and cancer data, comparing it to approaches based on single gene
tests and to various greedy approaches. We show that our method does indeed
find the optimal solution and performs better than the other approaches.
Moreover, on two cancer datasets our method identifies subnetworks with
significant association to survival when none of the genes has significant
association with survival when considered in isolation.Comment: This paper was selected for oral presentation at RECOMB 2016 and an
abstract is published in the conference proceeding
Modeling cancer metabolism on a genome scale
Cancer cells have fundamentally altered cellular metabolism that is associated with their tumorigenicity and malignancy. In addition to the widely studied Warburg effect, several new key metabolic alterations in cancer have been established over the last decade, leading to the recognition that altered tumor metabolism is one of the hallmarks of cancer. Deciphering the full scope and functional implications of the dysregulated metabolism in cancer requires both the advancement of a variety of omics measurements and the advancement of computational approaches for the analysis and contextualization of the accumulated data. Encouragingly, while the metabolic network is highly interconnected and complex, it is at the same time probably the best characterized cellular network. Following, this review discusses the challenges that genomeâscale modeling of cancer metabolism has been facing. We survey several recent studies demonstrating the first strides that have been done, testifying to the value of this approach in portraying a networkâlevel view of the cancer metabolism and in identifying novel drug targets and biomarkers. Finally, we outline a few new steps that may further advance this field
Integrative Data Analytic Framework to Enhance Cancer Precision Medicine
With the advancement of high-throughput biotechnologies, we increasingly
accumulate biomedical data about diseases, especially cancer. There is a need
for computational models and methods to sift through, integrate, and extract
new knowledge from the diverse available data to improve the mechanistic
understanding of diseases and patient care. To uncover molecular mechanisms and
drug indications for specific cancer types, we develop an integrative framework
able to harness a wide range of diverse molecular and pan-cancer data. We show
that our approach outperforms competing methods and can identify new
associations. Furthermore, through the joint integration of data sources, our
framework can also uncover links between cancer types and molecular entities
for which no prior knowledge is available. Our new framework is flexible and
can be easily reformulated to study any biomedical problems.Comment: 18 page
Identifying disease-associated genes based on artificial intelligence
Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations.
Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods.
The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations.
The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes.
In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction
Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer
Distinguishing driver mutations from passenger mutations within tumor cells continues to be a major challenge in cancer genomics. Many computational tools have been developed to address this challenge; however, they rely heavily on primary protein sequence context and frequency/mutation rate. Rare driver mutations not found in many cancer patients may be missed with these traditional approaches. Additionally, the structural context of mutations on tertiary/quaternary protein structures is not taken into account and may play a more prominent role in determining phenotype and function. This dissertation first presents a novel computational tool called HotSpot3D, which identifies regions of protein structures that are enriched in proximal mutations from cancer patients and identifies clusters of mutations within a single protein as well as along the interface of protein-protein complexes. This tool gives insight to potential rare driver mutations that may cluster closely to known hotspot driver mutations as well as critical regions of proteins specific to certain cancer types. A small subset of predictions from this tool are validated using high throughput phosphorylation data and in vitro cell-based assay to support its biological utility. We then shift to studying the druggability of mutations and apply HotSpot3D to identify potential druggable mutations that cluster with known sensitive actionable mutations. We also demonstrate how utilizing integrative omics approaches better enables precision oncology; Combining multiple data types such as genomic mutations or mRNA/protein expression outliers as biomarkers of druggability can expand the druggable cohort, better inform treatment response, and nominate novel combinatorial therapies for clinical trials. Lastly, we improve driver predictions of HotSpot3D by creating a supervised learning approach that integrates additional biological features related to structural context beyond just positional clustering. Overall, this dissertation provides a suite of computational methods to explore mutations in the context of protein structure and their potential implications in oncogenesis
- âŠ