8 research outputs found
An integrative and predictive model for the influence of protein sequence, structure and excipients on aggregation propensity
Lysine and arginine content of proteins: Computational analysis suggests a new tool for solubility design
[Image: see text] Prediction and engineering of protein solubility is an important but imprecise area. While some features are routinely used, such as the avoidance of extensive non-polar surface area, scope remains for benchmarking of sequence and structural features with experimental data. We study properties in the context of experimental solubilities, protein gene expression levels, and families of abundant proteins (serum albumin and myoglobin) and their less abundant paralogues. A common feature that emerges for proteins with elevated solubility and at higher expression and abundance levels is an increased ratio of lysine content to arginine content. We suggest that the same properties of arginine that give rise to its recorded propensity for specific interaction surfaces also lead to favorable interactions at nonspecific contacts, and thus lysine is favored for proteins at relatively high concentration. A survey of protein therapeutics shows that a significant subset possesses a relatively low lysine to arginine ratio, and therefore may not be favored for high protein concentration. We conclude that modulation of lysine and arginine content could prove a useful and relatively simple addition to the toolkit available for engineering protein solubility in biotechnological applications
Schizophrenia, Human Leukocyte Antigen (HLA), and Herpes Viruses: Immunogenetic Associations at the Population Level
Several factors have been implicated in schizophrenia (SZ), including human herpes viruses (HHV) and the adaptive immunity Human Leukocyte Antigen (HLA) genes. Here we investigated these issues in 2 complementary ways. In one analysis, we evaluated SZ-HLA and HHV-HLA associations at the level of a single allele by computing (a) a SZ-HLA protection/susceptibility (P/S) score based on the covariance between SZ and 127 HLA allele prevalences in 14 European countries, (b) estimating in silico HHV-HLA best binding affinities for the 9βHHV strains, and (c) evaluating the dependence of P/S score on HHV-HLA binding affinities. These analyses yielded (a) a set of 127 SZ-HLA P/S scores, varying by >200Γ (maximum/minimum), which could not be accounted for by chance, (b) a set of 127 allelesβΓβ9 HHV best-estimated affinities, varying by >600Γ, and (c) a set of correlations between SZ-HLA P/S scores and HHV-HLA binding which indicated a prominent role of HHV1. In a subsequent analysis, we extended these findings to the individual person by taking into account the fact that every individual carries 12 HLA alleles and computed (a) the average SZ-HLA P/S scores of 12 randomly chosen alleles (2 per gene), an indicator of HLA-based SZ P/S for an individual, and (b) the average of the corresponding HHV estimated affinities for those alleles, an indicator of overall effectiveness of HHV-HLA binding. We found (a) that HLA protection for SZ was significantly more prominent than susceptibility, and (b) that protective SZ-HLA scores were associated with higher HHV-HLA binding affinities, indicating that HLA binding and subsequent elimination of several HHV strains may confer protection against schizophrenia
ProteinβSol: a web tool for predicting protein solubility from sequence
Motivation: Protein solubility is an important property in industrial and therapeutic applications. Prediction is a challenge, despite a growing understanding of the relevant physicochemical properties.Results: Protein-Sol is a web server for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility and an indication of the features which deviate most from average values. Two other properties are profiled in windowed calculation along the sequence: fold propensity, and net segment charge. The utility of these additional features is demonstrated with the example of thioredoxin