14 research outputs found
Utilizing Protein Structure to Identify Non-Random Somatic Mutations
Motivation: Human cancer is caused by the accumulation of somatic mutations
in tumor suppressors and oncogenes within the genome. In the case of oncogenes,
recent theory suggests that there are only a few key "driver" mutations
responsible for tumorigenesis. As there have been significant pharmacological
successes in developing drugs that treat cancers that carry these driver
mutations, several methods that rely on mutational clustering have been
developed to identify them. However, these methods consider proteins as a
single strand without taking their spatial structures into account. We propose
a new methodology that incorporates protein tertiary structure in order to
increase our power when identifying mutation clustering.
Results: We have developed a novel algorithm, iPAC: identification of Protein
Amino acid Clustering, for the identification of non-random somatic mutations
in proteins that takes into account the three dimensional protein structure. By
using the tertiary information, we are able to detect both novel clusters in
proteins that are known to exhibit mutation clustering as well as identify
clusters in proteins without evidence of clustering based on existing methods.
For example, by combining the data in the Protein Data Bank (PDB) and the
Catalogue of Somatic Mutations in Cancer, our algorithm identifies new
mutational clusters in well known cancer proteins such as KRAS and PI3KCa.
Further, by utilizing the tertiary structure, our algorithm also identifies
clusters in EGFR, EIF2AK2, and other proteins that are not identified by
current methodology
A Graph Theoretic Approach to Utilizing Protein Structure to Identify Non-Random Somatic Mutations
Background: It is well known that the development of cancer is caused by the
accumulation of somatic mutations within the genome. For oncogenes
specifically, current research suggests that there is a small set of "driver"
mutations that are primarily responsible for tumorigenesis. Further, due to
some recent pharmacological successes in treating these driver mutations and
their resulting tumors, a variety of methods have been developed to identify
potential driver mutations using methods such as machine learning and
mutational clustering. We propose a novel methodology that increases our power
to identify mutational clusters by taking into account protein tertiary
structure via a graph theoretical approach.
Results: We have designed and implemented GraphPAC (Graph Protein Amino Acid
Clustering) to identify mutational clustering while considering protein spatial
structure. Using GraphPAC, we are able to detect novel clusters in proteins
that are known to exhibit mutation clustering as well as identify clusters in
proteins without evidence of prior clustering based on current methods.
Specifically, by utilizing the spatial information available in the Protein
Data Bank (PDB) along with the mutational data in the Catalogue of Somatic
Mutations in Cancer (COSMIC), GraphPAC identifies new mutational clusters in
well known oncogenes such as EGFR and KRAS. Further, by utilizing graph theory
to account for the tertiary structure, GraphPAC identifies clusters in DPP4,
NRP1 and other proteins not identified by existing methods. The R package is
available at: http://bioconductor.org/packages/release/bioc/html/GraphPAC.html
Conclusion: GraphPAC provides an alternative to iPAC and an extension to
current methodology when identifying potential activating driver mutations by
utilizing a graph theoretic approach when considering protein tertiary
structure.Comment: 25 pages, 8 figures, 3 Table
A Spatial Simulation Approach to Account for Protein Structure When Identifying Non-Random Somatic Mutations
Background: Current research suggests that a small set of "driver" mutations
are responsible for tumorigenesis while a larger body of "passenger" mutations
occurs in the tumor but does not progress the disease. Due to recent
pharmacological successes in treating cancers caused by driver mutations, a
variety of of methodologies that attempt to identify such mutations have been
developed. Based on the hypothesis that driver mutations tend to cluster in key
regions of the protein, the development of cluster identification algorithms
has become critical.
Results: We have developed a novel methodology, SpacePAC (Spatial Protein
Amino acid Clustering), that identifies mutational clustering by considering
the protein tertiary structure directly in 3D space. By combining the
mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and
the spatial information in the Protein Data Bank (PDB), SpacePAC is able to
identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In
addition, SpacePAC is better able to localize the most significant mutational
hotspots as demonstrated in the cases of BRAF and ALK. The R package is
available on Bioconductor at:
http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html
Conclusion: SpacePAC adds a valuable tool to the identification of mutational
clusters while considering protein tertiary structureComment: 16 pages, 8 Figures, 4 Table
SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering
Background: With the advances in high throughput technologies, increasing amounts of cancer somatic mutation data are being generated and made available. Only a small number of (driver) mutations occur in driver genes and are responsible for carcinogenesis, while the majority of (passenger) mutations do not influence tumour biology. In this study, SomInaClust is introduced, a method that accurately identifies driver genes based on their mutation pattern across tumour samples and then classifies them into oncogenes or tumour suppressor genes respectively.
Results: SomInaClust starts from the observation that oncogenes mainly contain mutations that, due to positive selection, cluster at similar positions in a gene across patient samples, whereas tumour suppressor genes contain a high number of protein-truncating mutations throughout the entire gene length. The method was shown to prioritize driver genes in 9 different solid cancers. Furthermore it was found to be complementary to existing similar-purpose methods with the additional advantages that it has a higher sensitivity, also for rare mutations (occurring in less than 1% of all samples), and it accurately classifies candidate driver genes in putative oncogenes and tumour suppressor genes. Pathway enrichment analysis showed that the identified genes belong to known cancer signalling pathways, and that the distinction between oncogenes and tumour suppressor genes is biologically relevant.
Conclusions: SomInaClust was shown to detect candidate driver genes based on somatic mutation patterns of inactivation and clustering and to distinguish oncogenes from tumour suppressor genes. The method could be used for the identification of new cancer genes or to filter mutation data for further data-integration purposes
mutation3D:Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported
Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine
High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer