936 research outputs found

    Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning

    Get PDF
    Many protein engineering problems involve finding mutations that produce proteins with a particular function. Computational active learning is an attractive approach to discover desired biological activities. Traditional active learning techniques have been optimized to iteratively improve classifier accuracy, not to quickly discover biologically significant results. We report here a novel active learning technique, Most Informative Positive (MIP), which is tailored to biological problems because it seeks novel and informative positive results. MIP active learning differs from traditional active learning methods in two ways: (1) it preferentially seeks Positive (functionally active) examples; and (2) it may be effectively extended to select gene regions suitable for high throughput combinatorial mutagenesis. We applied MIP to discover mutations in the tumor suppressor protein p53 that reactivate mutated p53 found in human cancers. This is an important biomedical goal because p53 mutants have been implicated in half of all human cancers, and restoring active p53 in tumors leads to tumor regression. MIP found Positive (cancer rescue) p53 mutants in silico using 33% fewer experiments than traditional non-MIP active learning, with only a minor decrease in classifier accuracy. Applying MIP to in vivo experimentation yielded immediate Positive results. Ten different p53 mutations found in human cancers were paired in silico with all possible single amino acid rescue mutations, from which MIP was used to select a Positive Region predicted to be enriched for p53 cancer rescue mutants. In vivo assays showed that the predicted Positive Region: (1) had significantly more (p<0.01) new strong cancer rescue mutants than control regions (Negative, and non-MIP active learning); (2) had slightly more new strong cancer rescue mutants than an Expert region selected for purely biological considerations; and (3) rescued for the first time the previously unrescuable p53 cancer mutant P152L

    A comparative analysis to predict p53 activity using classification models

    Get PDF
    Mutation studies of TP53, the gene coding the tumor protein p53, have become increasingly common in cancer research to understand its structural changes and its implications for tumor suppression. The protein’s structure is built with four identical chains containing 393 amino acids per chain. This homo-tetrameric configuration of p53 plays an important role in suppressing tumors and it is important to understand the structure-function dynamics and their role in cancer development. A p53 mutant dataset was obtained from the University of California at Irvine (UCI) Machine Learning Repository to infer p53 protein’s ability to suppress tumors based on its two-dimensional (2D) and three-dimensional (3D) structural features. The dataset consisted of 31,283 instances (observations) and 5,408 numerical features. Among the total features, the first 4,826 accounted for 2D structural features which were based on electrostatic and surface properties. The remaining 582 3D features were the distance maps between mutant and wild type p53. After selecting a subset of the features that were statistically relevant in predicting the outcome (n=100), three classification algorithms, Logistic Regression (LR), Support Vector Machine (SVM) and Random Forest (RF), were fit to the data and trained using a cross-validation scheme to obtain good parameters to classify an active p53 mutant from its inactive counterparts. Performance metrics in terms of accuracy and area-under-the-curve (AUC) were utilized in order to evaluate a particular classification model. Among the three different algorithms used to predict the outcome, LR seemed to outperform SVM and RF with an accuracy ranging from 0.75 to 0.81 and AUC ranging from 0.75 to 0.88. The LR model identified 2D feature numbers 60,74,49,40, and 73 as features of high importance in predicting the activity of p53. The public health significance of this study is that it advances the understanding of p53, which is critical to cancer tumor suppression, by helping to predict p53 activation using set of structural features obtained from simple classification models

    Predicting Transcriptional Activity of Multiple Site p53 Mutants Based on Hybrid Properties

    Get PDF
    As an important tumor suppressor protein, reactivate mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In this work, we developed a new computational method to predict the transcriptional activity for one-, two-, three- and four-site p53 mutants, respectively. With the approach from the general form of pseudo amino acid composition, we used eight types of features to represent the mutation and then selected the optimal prediction features based on the maximum relevance, minimum redundancy, and incremental feature selection methods. The Mathew's correlation coefficients (MCC) obtained by using nearest neighbor algorithm and jackknife cross validation for one-, two-, three- and four-site p53 mutants were 0.678, 0.314, 0.705, and 0.907, respectively. It was revealed by the further optimal feature set analysis that the 2D (two-dimensional) structure features composed the largest part of the optimal feature set and maybe played the most important roles in all four types of p53 mutant active status prediction. It was also demonstrated by the optimal feature sets, especially those at the top level, that the 3D structure features, conservation, physicochemical and biochemical properties of amino acid near the mutation site, also played quite important roles for p53 mutant active status prediction. Our study has provided a new and promising approach for finding functionally important sites and the relevant features for in-depth study of p53 protein and its action mechanism

    All-codon scanning identifies p53 cancer rescue mutations

    Get PDF
    In vitro scanning mutagenesis strategies are valuable tools to identify critical residues in proteins and to generate proteins with modified properties. We describe the fast and simple All-Codon Scanning (ACS) strategy that creates a defined gene library wherein each individual codon within a specific target region is changed into all possible codons with only a single codon change per mutagenesis product. ACS is based on a multiplexed overlapping mutagenesis primer design that saturates only the targeted gene region with single codon changes. We have used ACS to produce single amino-acid changes in small and large regions of the human tumor suppressor protein p53 to identify single amino-acid substitutions that can restore activity to inactive p53 found in human cancers. Single-tube reactions were used to saturate defined 30-nt regions with all possible codon changes. The same technique was used in 20 parallel reactions to scan the 600-bp fragment encoding the entire p53 core domain. Identification of several novel p53 cancer rescue mutations demonstrated the utility of the ACS approach. ACS is a fast, simple and versatile method, which is useful for protein structure–function analyses and protein design or evolution problems

    Ensemble-Based Computational Approach Discriminates Functional Activity of p53 Cancer and Rescue Mutants

    Get PDF
    The tumor suppressor protein p53 can lose its function upon single-point missense mutations in the core DNA-binding domain (“cancer mutants”). Activity can be restored by second-site suppressor mutations (“rescue mutants”). This paper relates the functional activity of p53 cancer and rescue mutants to their overall molecular dynamics (MD), without focusing on local structural details. A novel global measure of protein flexibility for the p53 core DNA-binding domain, the number of clusters at a certain RMSD cutoff, was computed by clustering over 0.7 µs of explicitly solvated all-atom MD simulations. For wild-type p53 and a sample of p53 cancer or rescue mutants, the number of clusters was a good predictor of in vivo p53 functional activity in cell-based assays. This number-of-clusters (NOC) metric was strongly correlated (r2 = 0.77) with reported values of experimentally measured ΔΔG protein thermodynamic stability. Interpreting the number of clusters as a measure of protein flexibility: (i) p53 cancer mutants were more flexible than wild-type protein, (ii) second-site rescue mutations decreased the flexibility of cancer mutants, and (iii) negative controls of non-rescue second-site mutants did not. This new method reflects the overall stability of the p53 core domain and can discriminate which second-site mutations restore activity to p53 cancer mutants

    Improving SNR and reducing training time of classifiers in large datasets via kernel averaging

    Get PDF
    Kernel methods are of growing importance in neuroscience research. As an elegant extension of linear methods, they are able to model complex non-linear relationships. However, since the kernel matrix grows with data size, the training of classifiers is computationally demanding in large datasets. Here, a technique developed for linear classifiers is extended to kernel methods: In linearly separable data, replacing sets of instances by their averages improves signal-to-noise ratio (SNR) and reduces data size. In kernel methods, data is linearly non-separable in input space, but linearly separable in the high-dimensional feature space that kernel methods implicitly operate in. It is shown that a classifier can be efficiently trained on instances averaged in feature space by averaging entries in the kernel matrix. Using artificial and publicly available data, it is shown that kernel averaging improves classification performance substantially and reduces training time, even in non-linearly separable data
    corecore