9,491 research outputs found

    Coevolved mutations reveal distinct architectures for two core proteins in the bacterial flagellar motor

    Get PDF
    Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) "torque" helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.JK was supported by Medical Research Council grant U117581331. SK was supported by seed funds from Lahore University of Managment Sciences (LUMS) and the Molecular Biology Consortium

    Alignment-free phylogenetic reconstruction: Sample complexity via a branching process analysis

    Get PDF
    We present an efficient phylogenetic reconstruction algorithm allowing insertions and deletions which provably achieves a sequence-length requirement (or sample complexity) growing polynomially in the number of taxa. Our algorithm is distance-based, that is, it relies on pairwise sequence comparisons. More importantly, our approach largely bypasses the difficult problem of multiple sequence alignment.Comment: Published in at http://dx.doi.org/10.1214/12-AAP852 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    Protein sectors: statistical coupling analysis versus conservation

    Full text link
    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.Comment: 36 pages, 17 figure

    Robust computation of linear models by convex relaxation

    Get PDF
    Consider a dataset of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called REAPER, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the REAPER problem, and it documents numerical experiments which confirm that REAPER can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when REAPER can approximate this subspace.Comment: Formerly titled "Robust computation of linear models, or How to find a needle in a haystack

    Quantification of the variation in percentage identity for protein sequence alignments

    Get PDF
    BACKGROUND: Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method. RESULTS: The maximum variation in PID due to the calculation method was 11.5% while the effect of alignment algorithm on PID was up to 14.6% across three popular alignment methods. The combined effect of alignment algorithm and PID calculation gave a variation of up to 22% on the test data, with an average of 5.3% ± 2.8% for sequence pairs with < 30% identity. In order to see which PID method was most highly correlated with structural similarity, four different PID calculations were compared to similarity scores (Sc) from the comparison of the corresponding protein three-dimensional structures. The highest correlation coefficient for a PID calculation was 0.80. In contrast, the more sophisticated Z-score calculated by reference to randomized sequences gave a correlation coefficient of 0.84. CONCLUSION: Although it is well known amongst expert sequence analysts that PID is a poor score for discriminating between protein sequences, the apparent simplicity of the percentage identity score encourages its widespread use in establishing cutoffs for structural similarity. This paper illustrates that not only is PID a poor measure of sequence similarity when compared to the Z-score, but that there is also a large uncertainty in reported PID values. Since better alternatives to PID exist to quantify sequence similarity, these should be quoted where possible in preference to PID. The findings presented here should prove helpful to those new to sequence analysis, and in warning those who seek to interpret the value of a PID reported in the literature

    Protein-protein interactions: impact of solvent and effects of fluorination

    Get PDF
    Proteins have an indispensable role in the cell. They carry out a wide variety of structural, catalytic and signaling functions in all known biological systems. To perform their biological functions, proteins establish interactions with other bioorganic molecules including other proteins. Therefore, protein-protein interactions is one of the central topics in molecular biology. My thesis is devoted to three different topics in the field of protein-protein interactions. The first one focuses on solvent contribution to protein interfaces as it is an important component of protein complexes. The second topic discloses the structural and functional potential of fluorine's unique properties, which are attractive for protein design and engineering not feasible within the scope of canonical amino acids. The last part of this thesis is a study of the impact of charged amino acid residues within the hydrophobic interface of a coiled-coil system, which is one of the well-established model systems for protein-protein interactions studies. I. The majority of proteins interact in vivo in solution, thus studies of solvent impact on protein-protein interactions could be crucial for understanding many processes in the cell. However, though solvent is known to be very important for protein-protein interactions in terms of structure, dynamics and energetics, its effects are often disregarded in computational studies because a detailed solvent description requires complex and computationally demanding approaches. As a consequence, many protein residues, which establish water-mediated interactions, are neither considered in an interface definition. In the previous work carried out in our group the protein interfaces database (SCOWLP) has been developed. This database takes into account interfacial solvent and based on this classifies all interfacial protein residues of the PDB into three classes based on their interacting properties: dry (direct interaction), dual (direct and water-mediated interactions), and wet spots (residues interacting only through one water molecule). To define an interaction SCOWLP considers a donor–acceptor distance for hydrogen bonds of 3.2 Å, for salt bridges of 4 Å, and for van der Waals contacts the sum of the van der Waals radii of the interacting atoms. In previous studies of the group, statistical analysis of a non-redundant protein structure dataset showed that 40.1% of the interfacial residues participate in water-mediated interactions, and that 14.5% of the total residues in interfaces are wet spots. Moreover, wet spots have been shown to display similar characteristics to residues contacting water molecules in cores or cavities of proteins. The goals of this part of the thesis were: 1. to characterize the impact of solvent in protein-protein interactions 2. to elucidate possible effects of solvent inclusion into the correlated mutations approach for protein contacts prediction To study solvent impact on protein interfaces a molecular dynamics (MD) approach has been used. This part of the work is elaborated in section 2.1 of this thesis. We have characterized properties of water-mediated protein interactions at residue and solvent level. For this purpose, an MD analysis of 17 representative complexes from SH3 and immunoglobulin protein families has been performed. We have shown that the interfacial residues interacting through a single water molecule (wet spots) are energetically and dynamically very similar to other interfacial residues. At the same time, water molecules mediating protein interactions have been found to be significantly less mobile than surface solvent in terms of residence time. Calculated free energies indicate that these water molecules should significantly affect formation and stability of a protein-protein complex. The results obtained in this part of the work also suggest that water molecules in protein interfaces contribute to the conservation of protein interactions by allowing more sequence variability in the interacting partners, which has important implications for the use of the correlated mutations concept in protein interactions studies. This concept is based on the assumption that interacting protein residues co-evolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. The study presented in section 2.2 has been carried out to prove that an explicit introduction of solvent into the correlated mutations concept indeed yields qualitative improvement of existing approaches. For this, we have used the data on interfacial solvent obtained from the SCOWLP database (the whole PDB) to construct a “wet” similarity matrix. This matrix has been used for prediction of protein contacts together with a well-established “dry” matrix. We have analyzed two datasets containing 50 domains and 10 domain pairs, and have compared the results obtained by using several combinations of both “dry” and “wet” matrices. We have found that for predictions for both intra- and interdomain contacts the introduction of a combination of a “dry” and a “wet” similarity matrix improves the predictions in comparison to the “dry” one alone. Our analysis opens up the idea that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches. There are two principally novel aspects in this study in the context of the used correlated mutations methodology : i) the first introduction of solvent explicitly into the correlated mutations approach; ii) the use of the definition of protein-protein interfaces, which is essentially different from many other works in the field because of taking into account physico-chemical properties of amino acids and not being exclusively based on distance cut-offs. II. The second part of the thesis is focused on properties of fluorinated amino acids in protein environments. In general, non-canonical amino acids with newly designed side-chain functionalities are powerful tools that can be used to improve structural, catalytic, kinetic and thermodynamic properties of peptides and proteins, which otherwise are not feasible within the use of canonical amino acids. In this context fluorinated amino acids have increasingly gained in importance in protein chemistry because of fluorine's unique properties: high electronegativity and a small atomic size. Despite the wide use of fluorine in drug design, properties of fluorine in protein environments have not been yet extensively studied. The aims of this part of the dissertation were: 1. to analyze the basic properties of fluorinated amino acids such as electrostatic and geometric characteristics, hydrogen bonding abilities, hydration properties and conformational preferences (section 3.1) 2. to describe the behavior of fluorinated amino acids in systems emulating protein environments (section 3.2, section 3.3) First, to characterize fluorinated amino acids side chains we have used fluorinated ethane derivatives as their simplified models and applied a quantum mechanics approach. Properties such as charge distribution, dipole moments, volumes and size of the fluoromethylated groups within the model have been characterized. Hydrogen bonding properties of these groups have been compared with the groups typically presented in natural protein environments. We have shown that hydrogen and fluorine atoms within these fluoromethylated groups are weak hydrogen bond donors and acceptors. Nevertheless they should not be disregarded for applications in protein engineering. Then, we have implemented four fluorinated L-amino acids for the AMBER force field and characterized their conformational and hydration properties at the MD level. We have found that hydrophobicity of fluorinated side chains grows with the number of fluorine atoms and could be explained in terms of high electronegativity of fluorine atoms and spacial demand of fluorinated side-chains. These data on hydration agrees with the results obtained in the experimental work performed by our collaborators. We have rationally engineered systems that allow us to study fluorine properties and extract results that could be extrapolated to proteins. For this, we have emulated protein environments by introducing fluorinated amino acids into a parallel coiled-coil and enzyme-ligand chymotrypsin systems. The results on fluorination effect on coiled-coil dimerization and substrate affinities in the chymotrypsin active site obtained by MD, molecular docking and free energy calculations are in strong agreement with experimental data obtained by our collaborators. In particular, we have shown that fluorine content and position of fluorination can considerably change the polarity and steric properties of an amino acid side chain and, thus, can influence the properties that a fluorinated amino acid reveals within a native protein environment. III. Coiled-coils typically consist of two to five right-handed α-helices that wrap around each other to form a left-handed superhelix. The interface of two α-helices is usually represented by hydrophobic residues. However, the analysis of protein databases revealed that in natural occurring proteins up to 20% of these positions are populated by polar and charged residues. The impact of these residues on stability of coiled-coil system is not clear. MD simulations together with free energy calculations have been utilized to estimate favourable interaction partners for uncommon amino acids within the hydrophobic core of coiled-coils (Chapter 4). Based on these data, the best hits among binding partners for one strand of a coiled-coil bearing a charged amino acid in a central hydrophobic core position have been selected. Computational data have been in agreement with the results obtained by our collaborators, who applied phage display technology and CD spectroscopy. This combination of theoretical and experimental approaches allowed to get a deeper insight into the stability of the coiled-coil system. To conclude, this thesis widens existing concepts of protein structural biology in three areas of its current importance. We expand on the role of solvent in protein interfaces, which contributes to the knowledge of physico-chemical properties underlying protein-protein interactions. We develop a deeper insight into the understanding of the fluorine's impact upon its introduction into protein environments, which may assist in exploiting the full potential of fluorine's unique properties for applications in the field of protein engineering and drug design. Finally we investigate the mechanisms underlying coiled-coil system folding. The results presented in the thesis are of definite importance for possible applications (e.g. introduction of solvent explicitly into the scoring function) into protein folding, docking and rational design methods. The dissertation consists of four chapters: ● Chapter 1 contains an introduction to the topic of protein-protein interactions including basic concepts and an overview of the present state of research in the field. ● Chapter 2 focuses on the studies of the role of solvent in protein interfaces. ● Chapter 3 is devoted to the work on fluorinated amino acids in protein environments. ● Chapter 4 describes the study of coiled-coils folding properties. The experimental parts presented in Chapters 3 and 4 of this thesis have been performed by our collaborators at FU Berlin. Sections 2.1, 2.2, 3.1, 3.2 and Chapter 4 have been submitted/published in peer-reviewed international journals. Their organization follows a standard research article structure: Abstract, Introduction, Methodology, Results and discussion, and Conclusions. Section 3.3, though not published yet, is also organized in the same way. The literature references are summed up together at the end of the thesis to avoid redundancy within different chapters
    • …
    corecore