327 research outputs found

    A simplified approach to disulfide connectivity prediction from protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity.</p> <p>Results</p> <p>We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors.</p> <p>Conclusion</p> <p>We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.</p

    Solution structure of the factor VIII binding region on von Willebrand factor

    Get PDF
    The preservation of haemostatic integrity is secured by the activities of von Willebrand factor (VWF). Upon vascular damage, VWF acts as a molecular bridge facilitating the initial adhesion and aggregation of platelets to the site of vessel injury, and consequently thrombus formation ensues. Furthermore, VWF is the faithful carrier of procoagulant factor VIII in plasma, thereby prolonging its half-life, and efficiently localising FVIII to the incipient platelet plug. The arrest of bleeding and maintenance of blood volume constancy is critically dependant on VWF; as is exemplified by von Willebrand disease (VWD) — the most common inherited bleeding disorder in man, resulting from defective or deficient VWF protein. Much of the function of VWF has been revealed, however, detailed insight into the molecular structure that enables VWF to orchestrate haemostatic processes, in particular FVIII stabilisation in plasma, is lacking. The high resolution NMR structure of the major FVIII binding region (D') on VWF, and the dynamics and flexibility of its substructure, are presented in this thesis. The complex disulphide-bonded D' region is composed of a two subdomain architecture — TIL'E'. Domain TIL' lacks extensive secondary structure, is strikingly dynamic on at least two timescales measured by NMR relaxation experiments, and this region is coincident with the clustering of pathological mutations leading to decreased FVIII binding affinity (type 2N VWD). This indicates that the conformational fluctuations and backbone malleability of domain TIL' collocate with biological activity. In contrast, the structured domain E', is rigid and contains the most commonly occurring and clinically mildest type 2N VWD mutation. These findings provide important insights into VWF:FVIII complex formation and represent a first step towards revealing the molecular basis of the bleeding diathesis type 2N von Willebrand disease

    Biophysical studies of TIMP-1

    Get PDF
    This study had two aspects. The first was the production and purification of TIMP-l. The second was a series of biophysical studies of TIMP-l and a TIMP-l derived peptide. A monoclonal antibody affinity column was developed and used to purifY large quantities of human TIMP-l for further experiments. Two E.coli expression systems were studied to determine whether they would be suitable for large scale production of recombinant protein. In the first system TIMP-I was to be secreted as a fusion protein which could be cleaved, leaving a free N-terminus. It was discovered that it was not possible to cleave off the fusion protein. In the second system, the protein was secreted, without additions to the periplasm. Although active protein, with the correct N-tenninus, was obtained, the yields were too low to be of use for large scale expression. Secondary structure analysis by CD and FTIR showed TIMP-l to be a mostly f3- sheet protein (approaching 50%) with around 20% a-helix. A temperature study using these techniques found that little change occurs until temperatures of over 60°C where the protein aggregates. The small changes appear to be a general loosening of the structure. In analyses of the surface of TIMP-l, additional carbohydrate was identified (other than the two N-linked chains) using Con-A probing of Western blots. TIMP-l purified from WI-38 foetal lung fibroblast cells can be separated into two pools by Concanavalin A-Sepharose chromatography. These two pools were found to have a different set of pIs and a different monosaccharide composition. The use of NMR paramagnetic probes identified a hydrophobic region exposed on the surface of TIMP-I. This region probably includes a tyrosine residue and either a tryptophan or phenylalanine. The presence of an exposed hydrophobic region was also shown in binding studies using the fluorescent probe ANS. These studies identified a single, low affinity binding site. An additional study with the N-terminal fragment of type-I collagenase found no binding sites on the enzyme, but a change in fluorescence occurred when TIMP-I was present. A peptide was designed based on the N-terminal sequence of TIMP-I. High homology, susceptibility to mutation and an interesting resemblance to the Bowman-Birk family of inhibitors suggested that this peptide might be inhibitory. It was found to have only a weak inhibitory activity against gelatinase. NMR studies of this peptide in water showed a large number of conformers as a result of stabilisation of the cis isomer of its proline residues. This preference for the cis form was retained for one proline in the solvent, TFE. Preliminary NMR studies were also carried out which concluded that TIMP-I should be suitable for further structural studies using isotopic labelling

    Predicting zinc binding at the proteome level

    Get PDF
    BACKGROUND: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial. RESULTS: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: CONCLUSION: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists

    Structure and dynamics of Pseudomonas aeruginosa ICP

    Get PDF
    Pseudomonas aeruginosa inhibitor of cysteine peptidases (PA-ICP) is a potent protein inhibitor of papain-like cysteine peptidases (CPs) identified in Pseudomonas aeruginosa, an opportunistic pathogenic bacteria that can cause severe infections in human. It belongs to the newly characterized natural CP inhibitors of the I42 family, designated the ICP family. The members of this family are present in some protozoa and bacterial pathogens. They can inhibit both parasite and mammalian CPs with high affinity and specificity. Whether the main biological function of the proteins in the pathogens is to regulate the hydrolytic activity of the organisms’ endogenous CPs or exogenous CPs so as to facilitate the pathogens’ invasion or survival is still under investigation. Although Pseudomonas aeruginosa contains a CP inhibitor, no CP genes are found in its genome, suggesting that the targets of PA-ICP may be exogenous. This hypothesis is supported by the presence of a putative secretion signal peptide at the N-terminus of PA-ICP which may be involved in exporting the protein to target exogenous CPs. In order to shed light on the biological function and inhibitory specificity of PA-ICP, the structure and backbone dynamics of this protein were characterised using NMR spectroscopy. In this project, the inhibitory activity of PA-ICP to a range of mammalian model CPs was also studied. Like its previously studied homologs, PA-ICP adopts an immunoglobulin fold comprised of seven β-strands. Three highly conserved sequence motifs located in mobile loop regions form the CP binding site. The inhibitor exhibits higher affinity toward the mammalian CP cathepsin L than cathepsins H and B. Homology modelling of the PA-ICP-cathspin L interaction based on the crystal structure of the chgasin-cathpsin L complex shows that PA-ICP may inhibit the peptidases by blocking the enzyme’s active site and that the interactions between chagasin and CPs may be conserved in PA-ICP-peptidase complexes. The specificity of the inhibitors may be determined by the relative flexibility of the loops bearing the binding site motifs and the electrostatic properties of certain residues near the binding sites
    • …
    corecore