17 research outputs found

    Active machine learning for transmembrane helix prediction

    Get PDF
    Abstract Background About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. Results An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. Conclusion Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments

    A Pan-cancer analysis reveals high-frequency genetic alterations in mediators of signaling by the tgf-β superfamily

    Get PDF
    We present an integromic analysis of gene alterations that modulate transforming growth factor β (TGF-β)-Smad-mediated signaling in 9,125 tumor samples across 33 cancer types in The Cancer Genome Atlas (TCGA). Focusing on genes that encode mediators and regulators of TGF-β signaling, we found at least one genomic alteration (mutation, homozygous deletion, or amplification) in 39% of samples, with highest frequencies in gastrointestinal cancers. We identified mutation hotspots in genes that encode TGF-β ligands (BMP5), receptors (TGFBR2, AVCR2A, and BMPR2), and Smads (SMAD2 and SMAD4). Alterations in the TGF-β superfamily correlated positively with expression of metastasis-associated genes and with decreased survival. Correlation analyses showed the contributions of mutation, amplification, deletion, DNA methylation, and miRNA expression to transcriptional activity of TGF-β signaling in each cancer type. This study provides a broad molecular perspective relevant for future functional and therapeutic studies of the diverse cancer pathways mediated by the TGF-β superfamily

    SWI/SNF tumor suppressor gene PBRM1/BAF180 in human clear cell kidney cancer

    No full text
    Mutations within chromatin modulating protein complexes have dominated the novel cancer gene landscape. However, little is known about how individual aberrations contribute to cancer formation. A novel Pbrm1 kidney cancer mouse model examining the role of Pbrm1 provides much needed clue concerning how SWI/SNF complexes might function as tumor suppressors

    Active Learning for Membrane Protein Structure Prediction

    No full text
    Background: About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. Results: An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning.We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. Conclusion: Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments.</p

    A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-β Superfamily

    No full text
    We present an integromic analysis of gene alterations that modulate transforming growth factor β (TGF-β)-Smad-mediated signaling in 9,125 tumor samples across 33 cancer types in The Cancer Genome Atlas (TCGA). Focusing on genes that encode mediators and regulators of TGF-β signaling, we found at least one genomic alteration (mutation, homozygous deletion, or amplification) in 39% of samples, with highest frequencies in gastrointestinal cancers. We identified mutation hotspots in genes that encode TGF-β ligands (BMP5), receptors (TGFBR2, AVCR2A, and BMPR2), and Smads (SMAD2 and SMAD4). Alterations in the TGF-β superfamily correlated positively with expression of metastasis-associated genes and\ua0with decreased survival. Correlation analyses showed the contributions of mutation, amplification, deletion, DNA methylation, and miRNA expression to transcriptional activity of TGF-β signaling in each cancer type. This study provides a broad molecular perspective relevant for future functional and therapeutic studies of the diverse cancer pathways mediated by the TGF-β superfamily

    A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-β Superfamily

    No full text
    © 2018 Elsevier Inc. We present an integromic analysis of gene alterations that modulate transforming growth factor β (TGF-β)-Smad-mediated signaling in 9,125 tumor samples across 33 cancer types in The Cancer Genome Atlas (TCGA). Focusing on genes that encode mediators and regulators of TGF-β signaling, we found at least one genomic alteration (mutation, homozygous deletion, or amplification) in 39% of samples, with highest frequencies in gastrointestinal cancers. We identified mutation hotspots in genes that encode TGF-β ligands (BMP5), receptors (TGFBR2, AVCR2A, and BMPR2), and Smads (SMAD2 and SMAD4). Alterations in the TGF-β superfamily correlated positively with expression of metastasis-associated genes and with decreased survival. Correlation analyses showed the contributions of mutation, amplification, deletion, DNA methylation, and miRNA expression to transcriptional activity of TGF-β signaling in each cancer type. This study provides a broad molecular perspective relevant for future functional and therapeutic studies of the diverse cancer pathways mediated by the TGF-β superfamily. To date, there are no studies of the TGF-β superfamily of signaling pathways across multiple cancers. This study represents a key starting point for unraveling the role of this complex superfamily in 33 divergent cancer types from over 9,000 patients
    corecore