10 research outputs found
Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage
<p>Abstract</p> <p>Background</p> <p>HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An <it>in silico </it>mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions.</p> <p>Results</p> <p>Classifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays.</p> <p>Conclusions</p> <p>The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at <url>http://proteins.gmu.edu/automute</url>.</p
Modeling functional changes to Escherichia coli thymidylate synthase upon single residue replacements: a structure-based approach
Escherichia coli thymidylate synthase (TS) is an enzyme that is indispensable to DNA synthesis and cell division, as it provides the only de novo source of dTMP by catalyzing the reductive methylation of dUMP, thus making it a key target for chemotherapeutic agents. High resolution X-ray crystallographic structures are available for TS and, owing to its relatively small size, successful experimental mutagenesis studies have been conducted on the enzyme. In this study, an in silico mutagenesis technique is used to investigate the effects of single amino acid substitutions in TS on enzymatic activity, one that employs the TS protein structure as well as a knowledge-based, four-body statistical potential. For every single residue TS variant, this approach yields both a global structural perturbation score and a set of local environmental perturbation scores that characterize the mutated position as well as all structurally neighboring residues. Global scores for the TS variants are capable of uniquely characterizing groups of residue positions in the enzyme according to their physicochemical, functional, or structural properties. Additionally, these global scores elucidate a statistically significant structureāfunction relationship among a collection of 372 single residue TS variants whose activity levels have been experimentally determined. Predictive models of TS variant activity are subsequently trained on this dataset of experimental mutants, whose respective feature vectors encode information regarding the mutated position as well as its six nearest residue neighbors in the TS structure, including their environmental perturbation scores
All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds
Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-āRā-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment
Modeling transcriptional activation changes to Gal4 variants via structure-based computational mutagenesis
As a DNA binding transcriptional activator, Gal4 promotes the expression of genes responsible for galactose metabolism. The Gal4 protein from Saccharomyces cerevisiae (bakerās yeast) has become a model for studying eukaryotic transcriptional activation in general because its regulatory properties mirror those of several eukaryotic organisms, including mammals. Given the availability of a crystallographic structure for Gal4, here we implement an in silico mutagenesis technique that makes use of a four-body knowledge-based energy function, in order to empirically quantify the structural impacts associated with single residue substitutions on the Gal4 protein. These results were used to examine the structure-function relationship in Gal4 based on a recently published experimental mutagenesis study, whereby functional changes to a uniformly distributed set of 1,068 single residue Gal4 variants were obtained by measuring their transcriptional activation levels relative to wild-type. A significant correlation was observed between computed (scalar) structural effect data and measured activity values for this collection of single residue Gal4 variants. Additionally, attribute vectors quantifying position-specific environmental impacts were generated for each of the Gal4 variants via computational mutagenesis, and we implemented supervised classification and regression statistical machine learning algorithms to train predictive models of variant Gal4 activity based on these structural changes. All models performed well under cross-validation testing, with balanced accuracy reaching 91% among the classification models, and with the actual and predicted activity values displaying a correlation as high as rĀ =Ā 0.80 for the regression models. Reliable predictions of transcriptional activation levels for Gal4 variants that have yet to be studied can be instantly generated by submitting their respective structure-based feature vectors to the trained models for testing. Such a computational pre-screening of Gal4 variants may potentially reduce costs associated with running large-scale mutagenesis experiments