259 research outputs found
Metazoans evolved by taking domains from soluble proteins to expand intercellular communication network
A central question in animal evolution is how multicellular animals evolved from unicellular ancestors. We hypothesize that membrane proteins must be key players in the development of multicellularity because they are well positioned to form the cell-cell contacts and to provide the intercellular communication required for the creation of complex organisms. Here we find that a major mechanism for the necessary increase in membrane protein complexity in the transition from non-metazoan to metazoan life was the new incorporation of domains from soluble proteins. The membrane proteins that have incorporated soluble domains in metazoans are enriched in many of the functions unique to multicellular organisms such as cell-cell adhesion, signaling, immune defense and developmental processes. They also show enhanced protein-protein interaction (PPI) network complexity and centrality, suggesting an important role in the cellular diversification found in complex organisms. Our results expose an evolutionary mechanism that contributed to the development of higher life forms.open1144sciescopu
Structural Insights into the Evolution of a Non-Biological Protein: Importance of Surface Residues in Protein Fold Optimization
Phylogenetic profiling of amino acid substitution patterns in proteins has led many to conclude that most structural information is carried by interior core residues that are solvent inaccessible. This conclusion is based on the observation that buried residues generally tolerate only conserved sequence changes, while surface residues allow more diverse chemical substitutions. This notion is now changing as it has become apparent that both core and surface residues play important roles in protein folding and stability. Unfortunately, the ability to identify specific mutations that will lead to enhanced stability remains a challenging problem. Here we discuss two mutations that emerged from an in vitro selection experiment designed to improve the folding stability of a non-biological ATP binding protein. These mutations alter two solvent accessible residues, and dramatically enhance the expression, solubility, thermal stability, and ligand binding affinity of the protein. The significance of both mutations was investigated individually and together, and the X-ray crystal structures of the parent sequence and double mutant protein were solved to a resolution limit of 2.8 and 1.65 Å, respectively. Comparative structural analysis of the evolved protein to proteins found in nature reveals that our non-biological protein evolved certain structural features shared by many thermophilic proteins. This experimental result suggests that protein fold optimization by in vitro selection offers a viable approach to generating stable variants of many naturally occurring proteins whose structures and functions are otherwise difficult to study
Somatic Mutations Reveal Lineage Relationships and Age-Related Mutagenesis in Human Hematopoiesis
Mutation accumulation during life can contribute to
hematopoietic dysfunction; however, the underlying
dynamics are unknown. Somatic mutations in blood
progenitors can provide insight into the rate and processes
underlying this accumulation, as well as the
developmental lineage tree and stem cell division
numbers. Here,we catalog mutations in the genomes
of human-bone-marrow-derived and umbilical-cordblood-
derived hematopoietic stem and progenitor
cells (HSPCs). We find that mutations accumulate
gradually during life with approximately 14 base substitutions
per year. The majority of mutations were
acquired after birth and could be explained by the
constant activity of various endogenous mutagenic
processes, which also explains the mutation load in
acute myeloid leukemia (AML). Using these mutations,
we construct a developmental lineage tree of
human hematopoiesis, revealing a polyclonal architecture
and providing evidence that developmental
clones exhibit multipotency. Our approach highlights
features of human native hematopoiesis and its
implications for leukemogenesis.The authors would like to thank the Hartwig Medical Foundation (Amsterdam, the Netherlands) for facilitating low-input whole-genome sequencing, P.J. Coffer for providing umbilical cord blood samples, and P.J. Campbell and D.C. Wedge for sharing scripts. This study was financially supported by an EMBO long-term fellowship to F.G.O. (ALTF 655-2016), an ERC starting grant (ERC2014-STG637904) to I.V., a VIDI grant of the Netherlands Organisation for Scientific Research (NWO) (no. 016.Vidi.171.023) to R.v.B., funding from Worldwide Cancer Research (WCR) (no. 16-0193) to R.v.B., and NIH grants HL128850-01A1 and P01HL13147 to F.D.C. F.D.C. is a scholar of the Howard Hughes Medical Institute and the Leukemia and Lymphoma Society
Somatic Mutations Reveal Lineage Relationships and Age-Related Mutagenesis in Human Hematopoiesis
Mutation accumulation during life can contribute to
hematopoietic dysfunction; however, the underlying
dynamics are unknown. Somatic mutations in blood
progenitors can provide insight into the rate and processes
underlying this accumulation, as well as the
developmental lineage tree and stem cell division
numbers. Here,we catalog mutations in the genomes
of human-bone-marrow-derived and umbilical-cordblood-
derived hematopoietic stem and progenitor
cells (HSPCs). We find that mutations accumulate
gradually during life with approximately 14 base substitutions
per year. The majority of mutations were
acquired after birth and could be explained by the
constant activity of various endogenous mutagenic
processes, which also explains the mutation load in
acute myeloid leukemia (AML). Using these mutations,
we construct a developmental lineage tree of
human hematopoiesis, revealing a polyclonal architecture
and providing evidence that developmental
clones exhibit multipotency. Our approach highlights
features of human native hematopoiesis and its
implications for leukemogenesis.The authors would like to thank the Hartwig Medical Foundation (Amsterdam, the Netherlands) for facilitating low-input whole-genome sequencing, P.J. Coffer for providing umbilical cord blood samples, and P.J. Campbell and D.C. Wedge for sharing scripts. This study was financially supported by an EMBO long-term fellowship to F.G.O. (ALTF 655-2016), an ERC starting grant (ERC2014-STG637904) to I.V., a VIDI grant of the Netherlands Organisation for Scientific Research (NWO) (no. 016.Vidi.171.023) to R.v.B., funding from Worldwide Cancer Research (WCR) (no. 16-0193) to R.v.B., and NIH grants HL128850-01A1 and P01HL13147 to F.D.C. F.D.C. is a scholar of the Howard Hughes Medical Institute and the Leukemia and Lymphoma Society
Potential for early warning of viral influenza activity in the community by monitoring clinical diagnoses of influenza in hospital emergency departments
<p>Abstract</p> <p>Background</p> <p>Although syndromic surveillance systems are gaining acceptance as useful tools in public health, doubts remain about whether the anticipated early warning benefits exist. Many assessments of this question do not adequately account for the confounding effects of autocorrelation and trend when comparing surveillance time series and few compare the syndromic data stream against a continuous laboratory-based standard. We used time series methods to assess whether monitoring of daily counts of Emergency Department (ED) visits assigned a clinical diagnosis of influenza could offer earlier warning of increased incidence of viral influenza in the population compared with surveillance of daily counts of positive influenza test results from laboratories.</p> <p>Methods</p> <p>For the five-year period 2001 to 2005, time series were assembled of ED visits assigned a provisional ED diagnosis of influenza and of laboratory-confirmed influenza cases in New South Wales (NSW), Australia. Poisson regression models were fitted to both time series to minimise the confounding effects of trend and autocorrelation and to control for other calendar influences. To assess the relative timeliness of the two series, cross-correlation analysis was performed on the model residuals. Modelling and cross-correlation analysis were repeated for each individual year.</p> <p>Results</p> <p>Using the full five-year time series, short-term changes in the ED time series were estimated to precede changes in the laboratory series by three days. For individual years, the estimate was between three and 18 days. The time advantage estimated for the individual years 2003–2005 was consistently between three and four days.</p> <p>Conclusion</p> <p>Monitoring time series of ED visits clinically diagnosed with influenza could potentially provide three days early warning compared with surveillance of laboratory-confirmed influenza. When current laboratory processing and reporting delays are taken into account this time advantage is even greater.</p
Nature of protein family signatures: Insights from singular value analysis of position-specific scoring matrices
Position-specific scoring matrices (PSSMs) are useful for detecting weak
homology in protein sequence analysis, and they are thought to contain some
essential signatures of the protein families. In order to elucidate what kind
of ingredients constitute such family-specific signatures, we apply singular
value decomposition to a set of PSSMs and examine the properties of dominant
right and left singular vectors. The first right singular vectors were
correlated with various amino acid indices including relative mutability, amino
acid composition in protein interior, hydropathy, or turn propensity, depending
on proteins. A significant correlation between the first left singular vector
and a measure of site conservation was observed. It is shown that the
contribution of the first singular component to the PSSMs act to disfavor
potentially but falsely functionally important residues at conserved sites. The
second right singular vectors were highly correlated with hydrophobicity
scales, and the corresponding left singular vectors with contact numbers of
protein structures. It is suggested that sequence alignment with a PSSM is
essentially equivalent to threading supplemented with functional information.
The presented method may be used to separate functionally important sites from
structurally important ones, and thus it may be a useful tool for predicting
protein functions.Comment: 22 pages, 7 figures, 4 table
Molecular Basis of NDM-1, a New Antibiotic Resistance Determinant
The New Delhi Metallo-β-lactamase (NDM-1) was first reported in 2009 in a Swedish patient. A recent study reported that Klebsiella pneumonia NDM-1 positive strain or Escherichia coli NDM-1 positive strain was highly resistant to all antibiotics tested except tigecycline and colistin. These can no longer be relied on to treat infections and therefore, NDM-1 now becomes potentially a major global health threat
Structure-based statistical analysis of transmembrane helices
Recent advances in determination of the high-resolution structure of membrane proteins now enable analysis of the main features of amino acids in transmembrane (TM) segments in comparison with amino acids in water-soluble helices. In this work, we conducted a large-scale analysis of the prevalent locations of amino acids by using a data set of 170 structures of integral membrane proteins obtained from the MPtopo database and 930 structures of water-soluble helical proteins obtained from the protein data bank. Large hydrophobic amino acids (Leu, Val, Ile, and Phe) plus Gly were clearly prevalent in TM helices whereas polar amino acids (Glu, Lys, Asp, Arg, and Gln) were less frequent in this type of helix. The distribution of amino acids along TM helices was also examined. As expected, hydrophobic and slightly polar amino acids are commonly found in the hydrophobic core of the membrane whereas aromatic (Trp and Tyr), Pro, and the hydrophilic amino acids (Asn, His, and Gln) occur more frequently in the interface regions. Charged amino acids are also statistically prevalent outside the hydrophobic core of the membrane, and whereas acidic amino acids are frequently found at both cytoplasmic and extra-cytoplasmic interfaces, basic amino acids cluster at the cytoplasmic interface. These results strongly support the experimentally demonstrated biased distribution of positively charged amino acids (that is, the so-called the positive-inside rule) with structural data
Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage
<p>Abstract</p> <p>Background</p> <p>HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5) or CXCR4 (X4) co-receptors, which interact primarily with the third hypervariable loop (V3 loop) of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable) is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An <it>in silico </it>mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions.</p> <p>Results</p> <p>Classifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation), and leave-one-out cross-validation. Best reported values of sensitivity (85%), specificity (100%), and precision (98%) for predicting X4-capable HIV-1 virus, overall accuracy (97%), Matthew's correlation coefficient (89%), balanced error rate (0.08), and ROC area (0.97) all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays.</p> <p>Conclusions</p> <p>The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information. The classifiers are available online at <url>http://proteins.gmu.edu/automute</url>.</p
Fully automated high-quality NMR structure determination of small 2H-enriched proteins
Determination of high-quality small protein structures by nuclear magnetic resonance (NMR) methods generally requires acquisition and analysis of an extensive set of structural constraints. The process generally demands extensive backbone and sidechain resonance assignments, and weeks or even months of data collection and interpretation. Here we demonstrate rapid and high-quality protein NMR structure generation using CS-Rosetta with a perdeuterated protein sample made at a significantly reduced cost using new bacterial culture condensation methods. Our strategy provides the basis for a high-throughput approach for routine, rapid, high-quality structure determination of small proteins. As an example, we demonstrate the determination of a high-quality 3D structure of a small 8 kDa protein, E. coli cold shock protein A (CspA), using <4 days of data collection and fully automated data analysis methods together with CS-Rosetta. The resulting CspA structure is highly converged and in excellent agreement with the published crystal structure, with a backbone RMSD value of 0.5 Å, an all atom RMSD value of 1.2 Å to the crystal structure for well-defined regions, and RMSD value of 1.1 Å to crystal structure for core, non-solvent exposed sidechain atoms. Cross validation of the structure with 15N- and 13C-edited NOESY data obtained with a perdeuterated 15N, 13C-enriched 13CH3 methyl protonated CspA sample confirms that essentially all of these independently-interpreted NOE-based constraints are already satisfied in each of the 10 CS-Rosetta structures. By these criteria, the CS-Rosetta structure generated by fully automated analysis of data for a perdeuterated sample provides an accurate structure of CspA. This represents a general approach for rapid, automated structure determination of small proteins by NMR
- …