2,615 research outputs found

    Systematic analysis of somatic mutations driving cancer: Uncovering functional protein regions in disease development

    Get PDF
    Background: Recent advances in sequencing technologies enable the large-scale identification of genes that are affected by various genetic alterations in cancer. However, understanding tumor development requires insights into how these changes cause altered protein function and impaired network regulation in general and/or in specific cancer types. Results: In this work we present a novel method called iSiMPRe that identifies regions that are significantly enriched in somatic mutations and short in-frame insertions or deletions (indels). Applying this unbiased method to the complete human proteome, by using data enriched through various cancer genome projects, we identified around 500 protein regions which could be linked to one or more of 27 distinct cancer types. These regions covered the majority of known cancer genes, surprisingly even tumor suppressors. Additionally, iSiMPRe also identified novel genes and regions that have not yet been associated with cancer. Conclusions: While local somatic mutations correspond to only a subset of genetic variations that can lead to cancer, our systematic analyses revealed that they represent an accompanying feature of most cancer driver genes regardless of the primary mechanism by which they are perturbed during tumorigenesis. These results indicate that the accumulation of local somatic mutations can be used to pinpoint genes responsible for cancer formation and can also help to understand the effect of cancer mutations at the level of functional modules in a broad range of cancer driver genes. Reviewers: This article was reviewed by Sándor Pongor, Michael Gromiha and Zoltán Gáspári. © 2016 Mészáros et al

    Integrated Machine Learning and Bioinformatics Approaches for Prediction of Cancer-Driving Gene Mutations

    Get PDF
    Cancer arises from the accumulation of somatic mutations and genetic alterations in cell division checkpoints and apoptosis, this often leads to abnormal tumor proliferation. Proper classification of cancer-linked driver mutations will considerably help our understanding of the molecular dynamics of cancer. In this study, we compared several cancer-specific predictive models for prediction of driver mutations in cancer-linked genes that were validated on canonical data sets of functionally validated mutations and applied to a raw cancer genomics data. By analyzing pathogenicity prediction and conservation scores, we have shown that evolutionary conservation scores play a pivotal role in the classification of cancer drivers and were the most informative features in the driver mutation classification. Through extensive comparative analysis with structure-functional experiments and multicenter mutational calling data from PanCancer Atlas studies, we have demonstrated the robustness of our models and addressed the validity of computational predictions. We evaluated the performance of our models using the standard diagnostic metrics such as sensitivity, specificity, area under the curve and F-measure. To address the interpretability of cancer-specific classification models and obtain novel insights about molecular signatures of driver mutations, we have complemented machine learning predictions with structure-functional analysis of cancer driver mutations in several key tumor suppressor genes and oncogenes. Through the experiments carried out in this study, we found that evolutionary-based features have the strongest signal in the machine learning classification VII of driver mutations and provide orthogonal information to the ensembled-based scores that are prominent in the ranking of feature importance

    Sequence analysis methods for the design of cancer vaccines that target tumor-specific mutant antigens (neoantigens)

    Get PDF
    The human adaptive immune system is programmed to distinguish between self and non-self proteins and if trained to recognize markers unique to a cancer, it may be possible to stimulate the selective destruction of cancer cells. Therapeutic cancer vaccines aim to boost the immune system by selectively increasing the population of T cells specifically targeted to the tumor-unique antigens, thereby initiating cancer cell death.. In the past, this approach has primarily focused on targeted selection of ‘shared’ tumor antigens, found across many patients. The advent of massively parallel sequencing and specialized analytical approaches has enabled more efficient characterization of tumor-specific mutant antigens, or neoantigens. Specifically, methods to predict which tumor-specific mutant peptides (neoantigens) can elicit anti-tumor T cell recognition improve predictions of immune checkpoint therapy response and identify one or more neoantigens as targets for personalized vaccines. Selecting the best/most immunogenic neoantigens from a large number of mutations is an important challenge, in particular in cancers with a high mutational load, such as melanomas and smoker-associated lung cancers. To address such a challenging task, Chapter 1 of this thesis describes a genome-guided in silico approach to identifying tumor neoantigens that integrates tumor mutation and expression data (DNA- and RNA-Seq). The cancer vaccine design process, from read alignment to variant calling and neoantigen prediction, typically assumes that the genotype of the Human Reference Genome sequence surrounding each somatic variant is representative of the patient’s genome sequence, and does not account for the effect of nearby variants (somatic or germline) in the neoantigenic peptide sequence. Because the accuracy of neoantigen identification has important implications for many clinical trials and studies of basic cancer immunology, Chapter 2 describes and supports the need for patient-specific inclusion of proximal variants to address this previously oversimplified assumption in the identification of neoantigens. The method of neoantigen identification described in Chapter 1 was subsequently extended (Chapter 3) and improved by the addition of a modular workflow that aids in each component of the neoantigen prediction process from neoantigen identification, prioritization, data visualization, and DNA vaccine design. These chapters describe massively parallel sequence analysis methods that will help in the identification and subsequent refinement of patient-specific antigens for use in personalized immunotherapy

    Protein Structure-Guided Approaches to Identify Functional Mutations in Cancer

    Get PDF
    Distinguishing driver mutations from passenger mutations within tumor cells continues to be a major challenge in cancer genomics. Many computational tools have been developed to address this challenge; however, they rely heavily on primary protein sequence context and frequency/mutation rate. Rare driver mutations not found in many cancer patients may be missed with these traditional approaches. Additionally, the structural context of mutations on tertiary/quaternary protein structures is not taken into account and may play a more prominent role in determining phenotype and function. This dissertation first presents a novel computational tool called HotSpot3D, which identifies regions of protein structures that are enriched in proximal mutations from cancer patients and identifies clusters of mutations within a single protein as well as along the interface of protein-protein complexes. This tool gives insight to potential rare driver mutations that may cluster closely to known hotspot driver mutations as well as critical regions of proteins specific to certain cancer types. A small subset of predictions from this tool are validated using high throughput phosphorylation data and in vitro cell-based assay to support its biological utility. We then shift to studying the druggability of mutations and apply HotSpot3D to identify potential druggable mutations that cluster with known sensitive actionable mutations. We also demonstrate how utilizing integrative omics approaches better enables precision oncology; Combining multiple data types such as genomic mutations or mRNA/protein expression outliers as biomarkers of druggability can expand the druggable cohort, better inform treatment response, and nominate novel combinatorial therapies for clinical trials. Lastly, we improve driver predictions of HotSpot3D by creating a supervised learning approach that integrates additional biological features related to structural context beyond just positional clustering. Overall, this dissertation provides a suite of computational methods to explore mutations in the context of protein structure and their potential implications in oncogenesis

    miRNA:mRNA interplay in the malignant evolution of miniGIST to overt GIST

    Get PDF
    Gastrointestinal Stromal Tumor (GIST) is the most common mesenchymal tumor that occurs throughout the digestive tract and is thought to arise from the gastrointestinal (GI) pacemakers, the Interstitial Cells of Cajal (ICC). Different from most sarcomas for which premalignant lesions are not known, premalignant GIST counterparts have been identified. These entities, named miniGIST, share with overt GIST histological and molecular features, namely the presence of oncogenic mutations affecting the tyrosine kinases KIT or PDGFRA. MiniGISTs are remarkably common (about 1/3 of unselected elderly subjects carry miniGIST in their GI tract) whilst GIST are quite rare, indicating that a very minute fraction of miniGIST actually progress to clinically relevant tumors. This indicates that KIT/PDGFRA oncogenic mutations are insufficient to convey malignancy. The aim of this work was to address the molecular mechanisms that sustain miniGIST to overt GIST malignant evolution, focusing on particular on the role of miRNAs. By performing combined miRNA and mRNA NGS profilings of a large set of miniGISTs and overt GISTs we identified a set of miRNAs potentially involved in the transcriptional perturbation during GIST progression. We made a step ahead by in vitro validating the role of hsa-miR-485-5p loss in determining the BIRC5 gene upregulation in overt GIST. Overall, our work laid down the bases for the elucidation of the role of miRNA:mRNA interaction in the malignant evolution of GIST

    CHARACTERIZATION OF SINGLE RESIDUE VARIATIONS IN THE HUMAN POPULATION AND IN DISEASE: FUNCTIONAL IMPACT, STRUCTURAL IMPACT, AND DISTRIBUTION PATTERN

    Get PDF
    We have investigated the properties of three sets of human missense genetic variations: cancer somatic mutations, monogenic disease causing mutations, and population SNPs, from the point of view of their impact on molecular function, distribution propensity in different protein structure environments, and disease mechanism. Cancer genome sequencing projects have identified a large number of somatic missense mutations in cancers. We have used two analysis methods in the SNPs3D software package to assess the impact of these variants on protein function in vivo. One method identifies those mutations that significantly destabilize three dimensional protein structure, and the other detects all types of effect on protein function, utilizing sequence conservation. Data from a set of breast and colorectal tumors were analyzed. In known cancer genes, approaching 100% of missense mutations are found to impact protein function, supporting the view that these methods are appropriate for identifying driver mutations. Overall, we estimate that 50% to 60% of all somatic missense mutations have a high impact on structure stability or more generally affect the function of the corresponding proteins. This fraction is similar to the fraction of all possible missense mutations that have high impact, and much higher than the corresponding one for human population SNPs, at about 30%. We found that the majority of mutations in tumor suppressors destabilize protein structure, while mutations in oncogenes operate in more varied ways, including destabilization of the less active conformational states. A set of possible drivers with high impact is suggested. We also studied a set of germline missense variants in phenylalanine hydroxylase, found in phenylketonuria (PKU) patients. With the aid of SNPs3D, we reinforced the previous finding that a high proportion of disease missense mutations affect protein stability, rather than other aspects of protein structure and function. We then focused on the relationship between the presence of these stability damaging missense mutations and the corresponding experimental data for the level and activity of the PAH protein product present under `in vivo' like conditions. We found that, overall, destabilizing mutations result in substantially lower protein levels, but with the maintenance of wild type like specific activity. The overall agreement between predicted stability impact and experimental evidence for lower protein levels is high, and in accordance with the previous estimates of error rates for the methods. We next investigated the involvement of missense single base variants in the interface between two interacting proteins and their role in disease. This work consisted of three steps: first, mapping of variants onto the protein structure and identification of those in the interaction interfaces; second, distribution enrichment analysis in three structure locations (protein interior, surface, and interface); and third, impact analysis with SNPs3D. Nearly a quarter of disease causing mutations are mapped onto protein interfaces, with a strong propensity for the heteromeric interfaces, indicating that interruption of functional contacts between proteins is a significant disease mechanism. We found the enrichment propensity in the interfaces is intermediate between protein surface and interior for all three types of variants considered, namely SNPs, inter-species variants, and disease mutations. We also found missense SNPs and inter-species variants share the same enrichment pattern, with a relatively high density on the protein surface and depletion in the interior. In contrast, the disease mutations display the reverse pattern, with interior and interface the most susceptible places
    corecore