121 research outputs found

    DiANNA: a web server for disulfide connectivity prediction

    Get PDF
    Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond—this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods

    DISULFIND: a disulfide bonding state and cysteine connectivity prediction server

    Get PDF
    DISULFIND is a server for predicting the disulfide bonding state of cysteines and their disulfide connectivity starting from sequence alone. Optionally, disulfide connectivity can be predicted from sequence and a bonding state assignment given as input. The output is a simple visualization of the assigned bonding state (with confidence degrees) and the most likely connectivity patterns. The server is available at

    Identifying Calcium-Binding Sites and Predicting Disulfide Connectivity

    Get PDF
    Most questions in proteomics require complex answers. Yet graph theory, supervised learning, and statistical model have decomposed complex questions into simple questions with simple answers. The expertise in the field of protein study often address tasks that demand answers as complex as the questions. Such complex answers may consist of multiple factors that must be weighed against each other to arrive at a globally satisfactory and consistent solution to the question. In the prediction of calcium binding in proteins, we construct a global oxygen contact graph of a protein, then apply a graph algorithm to find oxygen clusters with the fixed size of four, finally employ a geometry algorithm to judge if the oxygen clusters are calcium-binding sites or not. Additionally, we can predict the locations of those sites. Furthermore, we construct a global oxygen contact graph including oxygen-bonded carbon atoms of a protein, then apply a graph algorithm to find local biggest oxygen clusters, finally design another geometric filter to exclude the non-calcium binding oxygen clusters. In addition, we apply observed chemical properties as a chemical filter to recognize some non-calcium binding oxygen clusters. In order to explore the characteristics of calcium-binding sites in proteins, we conduct a statistic survey on four datasets derived from 1994 to 2005 about the geometric parameters and chemical properties of calcium-binding sites. In the prediction of disulfide bond connectivity, we analyze protein sequences to predict the folding of proteins relative to the cystines using nearest neighboring methods. we extend a new pattern-wise method to all available template proteins, and find global pattern of pairing cysteines with a new descriptor of cysteine separation profile on protein secondary structure

    A Protocol to Detect Local Affinities Involved in Proteins Distant Interactions

    No full text
    The tridimensional structure of a protein is constrained or stabilized by some local interactions between distant residues of the protein, such as disulfide bonds, electrostatic interactions, hydrogen links, Wan Der Waals forces, etc. The correct prediction of such contacts should be an important step towards the whole challenge of tridimensional structure prediction. The in silico prediction of the disulfide connectivity has been widely studied: most results were based on few amino-acids around bonded and non-bonded cysteines, which we call local environments of bonded residues. In order to evaluate the impact of such local information onto residue pairing, we propose a machine learning based protocol, independent from the type of contact, to detect affinities between local environments which would contribute to residues pairing. This protocol requires that learning methods are able to learn from examples corrupted by class-conditional classification noise. To this end, we propose an adapted version of the perceptron algorithm. Finally, we experiment our protocol with this algorithm on proteins that feature disulfide or salt bridges. The results show that local environments contribute to the formation of salt bridges. As a by-product, these results prove the relevance of our protocol. However, results on disulfide bridges are not significantly positive. There can be two explanations: the class of linear functions used by the perceptron algorithm is not enough expressive to detect this information, or cysteines local environments do not contribute significantly to residues pairing

    A simplified approach to disulfide connectivity prediction from protein sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity.</p> <p>Results</p> <p>We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors.</p> <p>Conclusion</p> <p>We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.</p

    Multidimensional Feature Engineering for Post-Translational Modification Prediction Problems

    Get PDF
    Protein sequence data has been produced at an astounding speed. This creates an opportunity to characterize these proteins for the treatment of illness. A crucial characterization of proteins is their post translational modifications (PTM). There are 20 amino acids coded by DNA after coding (translation) nearly every protein is modified at an amino acid level. We focus on three specific PTMs. First is the bonding formed between two cysteine amino acids, thus introducing a loop to the straight chain of a protein. Second, we predict which cysteines can generally be modified (oxidized). Finally, we predict which lysine amino acids are modified by the active form of Vitamin B6 (PLP/pyridoxal-5-phosphate.) Our work aims to predict the PTM\u27s from protein sequencing data. When available, we integrate other data sources to improve prediction. Data mining finds patterns in data and uses these patterns to give a confidence score to unknown PTMs. There are many steps to data mining; however, our focus is on the feature engineering step i.e. the transforming of raw data into an intelligible form for a prediction algorithm. Our primary innovation is as follows: First, we created the Local Similarity Matrix (LSM), a description of the evolutionarily relatedness of a cysteine and its neighboring amino acids. This feature is taken two at a time and template matched to other cysteine pairs. If they are similar, then we give a high probability of it sharing the same bonding state. LSM is a three step algorithm, 1) a matrix of amino acid probabilities is created for each cysteine and its neighbors from an alignment. 2) We multiply the iv square of the BLOSUM62 matrix diagonal to each of the corresponding amino acids. 3) We z-score normalize the matrix by row. Next, we innovated the Residue Adjacency Matrix (RAM) for sequential and 3-D space (integration of protein coordinate data). This matrix describes cysteine\u27s neighbors but at much greater distances than most algorithms. It is particularly effective at finding conserved residues that are further away while still remaining a compact description. More data than necessary incurs the curse of dimensionality. RAM runs in O(n) time, making it very useful for large datasets. Finally, we produced the Windowed Alignment Scoring algorithm (WAS). This is a vector of protein window alignment bit scores. The alignments are one to all. Then we apply dimensionality reduction for gains in speed and performance. WAS uses the BLAST algorithm to align sequences within a window surrounding potential PTMs, in this case PLP attached to Lysine. In the case of WAS, we tried many alignment algorithms and used the approximation that BLAST provides to reduce computational time from months to days. The performances of different alignment algorithms did not vary significantly. The applications of this work are many. It has been shown that cysteine bonding configurations play a critical role in the folding of proteins. Solving the protein folding problem will help us to find the solution to Alzheimer\u27s disease that is due to a misfolding of the amyloid-beta protein. Cysteine oxidation has been shown to play a role in oxidative stress, a situation when free radicals become too abundant in the body. Oxidative stress leads to chronic illness such as diabetes, cancer, heart disease and Parkinson\u27s. Lysine in concert with PLP catalyzes the aminotransferase reaction. Research suggests that anti-cancer drugs will potentially selectively inhibit this reaction. Others have targeted this reaction for the treatment of epilepsy and addictions

    Hyphenating Ion Mobility With Mass Spectrometry to Increase the Information Content of Top-Down Analyses

    Get PDF
    Mass spectrometry (MS) has been established as important analytical tool in the characterization of an array of analyte classes, including biological samples. However, without hyphenation with other techniques, the approach has limitations to the information that can be elucidated and the samples that can be analyzed. In an attempt to overcome these limitations, separation is performed prior to MS analysis to aid in alleviating sample complexity while dissociation is incorporated to increase the information content. Here, we employ ion mobility (IM), a gas-phase separation technique, to disperse product ions resulting from collision-induced dissociation (CID), denoted as MS-CID-IM-MS, for top-down analysis for a variety of applications, specifically, primary structure elucidation, disulfide bond identification, secondary structure characterization, and polymer characterization. First, the fundamental attributes of this approach and the resulting information elucidated are investigated. Using this approach CID product ions are dispersed in two-dimensions, specifically size-to-charge (IM) and mass-to-charge (MS), and the resulting 2-D data display greatly facilitates the top-down information contents; (i) charge state specific trand lines, (ii) increased dynamic range, (iii) separation of overlapping ion signals. The increase in peak capacity allows for detection of low abundant fragment ions providing an increase in the primary sequence coverage and the confidence of ion assignments as demonstrated by melittin and ubiquitin. Second, this general approach is applied to the top-down analysis for a variety of applications. MS-CID-IM-MS is used for the structural characterization of disulfide linked protein ions by monitoring the ATD of the ion pre- and post-collisional activation. Similarly, this approach can also be used to distinguish product ion type as well as, in some cases, specific secondary structural elements, viz. extended coils or helices providing rapid identification of the onset and termination of extended coil structure in peptides as demonstrated by insulin B-chain. Detect of low abundant ion signals associated with cross-ring cleavages allows this approach to be extended to determine regiochemistry of glucose derived polymers. As demonstrated, the MS-CID-IM-MS approach is highly versatile owing to the information content gained upon dispersion of ions in two-dimensions, providing an effective increase in experimental dynamic range as well as providing conformational information

    Influence of the Conformation of Conotoxins on their Bioactivity

    Get PDF
    In the last decade peptides filled the gap between small molecule drugs and biologics as therapeutics. A rich source of such peptides represents the venom of different animals evolved over millions of years. Conotoxins are small, disulfide-rich neuropeptides isolated from cone snails of the genus Conus, which act on different biological targets like ion channels or receptors. These toxins contain a complex cocktail of bioactive substances and are utilized for self-defence or hunting prey. With over 700 species known, these marine invertebrates reveal a large potential of novel pharmacologically active molecules, e.g. for the treatment of severe and chronic pain. Although peptide synthesis is established nowadays, preparation and characterization of disulfide-rich peptides is still a challenge. The formation of different disulfide isomers is one main issue facilitating the elucidation of such disulfide-rich peptides and proteins. The aim of this work was to investigate the folding of small multiple disulfide-bridged peptides under different conditions and perform structure-activity relationship studies of the resulting products. Within the scope of this project different conotoxins were synthesized by solidphase peptide synthesis. Disulfide formation was achieved by different methods i.e. oxidative self-folding, oxidation in Ionic Liquids and successive oxidation in combination with an orthogonal protecting group strategy. The oxidized conotoxins were systematically investigated and characterized with different analytical methods. Elucidation of the final three-dimensional structure was performed by NMR spectroscopy. Furthermore, determination of the disulfide connectivity was investigated by different methods of mass spectrometric analysis (LC-ESI and MALDI) after partial reduction and derivatization. The results of all applied analytical methods revealed their advantages and limitations for the discrimination of different disulfide isomers within conotoxins depending on primary amino acid sequence. Biological testing of the synthesized peptides on ion channels was performed in parallel with computational studies to enlighten the structure-activity relationships. The work presented in this thesis significantly extends the knowledge about conotoxins and disulfide formation, their impact with respect to biological activity, and the challenging characterization and structure elucidation procedures.Auswirkung der Faltung von Conotoxinen auf ihre Biologische Aktivität In den letzten Jahrzehnten haben Peptidtherapeutika begonnen, die Lücke zwischen niedermolekularen Arzneimitteln und Biopharmazeutika zu schließen. Eine reiche Quelle für solche Peptide stellen die Giftstoffe verschiedener Tiere dar, die sich über Millionen von Jahren entwickeln konnten. Conotoxine sind kleine, disulfidverbrückte, neuroaktive Peptide, die verschiedene biologische Ziele wie Ionenkanäle und Rezeptoren beeinflussen und aus dem Gift der Kegelschnecke der Gattung Conus isoliert werden. Dieses Gift besteht aus einem komplexen Cocktail aus bioaktiven Substanzen und wird von der Schnecke zur Selbstverteidigung oder zur Jagd eingesetzt. Mit über 700 bekannten Spezies bergen diese wirbellosen Meerestiere ein großes Potential für neuartige, pharmakologisch wirksame Moleküle, z.B. zur Behandlung von akuten und chronischen Schmerzen. Obwohl die Peptidsynthese heutzutage etabliert ist, stellt die Herstellung und Charakterisierung von disulfidverbrückten Peptiden immer noch eine Herausforderung dar. Hierbei ist die Bildung von unterschiedlichen disulfidverbrückten Peptidisomeren ein Hauptproblem, welches die Aufklärung disulfidverbrückter Peptide und Proteine erschwert. Das Ziel dieser Arbeit war es, die Faltung von kleinen, mehrfach disulfidverbrückten Peptide unter verschiedenen Bedingungen zu analysieren und Struktur-Aktivitätsuntersuchungen der erhaltenen Produkte durchzuführen. Im Rahmen dieses Projektes wurden unterschiedliche Conotoxine mittels Festphasenpeptidsynthese hergestellt. Zur Ausbildung der Disulfidbrücken wurden verschiedene Methoden eingesetzt: oxidative Selbstfaltung, Oxidation in ionischen Flüssigkeiten und stufenweise Oxidation in Verbindung mit einer orthogonalen Schutzgruppenstrategie. Die oxidierten Conotoxine wurden anschließend funktionell untersucht und durch verschiedene analytische Methoden charakterisiert. Mittels NMR-Spektroskopie wurden die dreidimensionalen Strukturen der erhaltenen Peptide aufgeklärt. Weiterhin konnte nach partieller Reduktion und Derivatisierung die Disulfidverbrückung durch verschiedene massenspektrometrische Verfahren (LC-ESI und MALDI) bestimmt werden. Die Ergebnisse zeigen die jeweiligen Vorteile und Limitierungen der verwendeten analytischen Methoden hinsichtlich der Unterscheidung verschiedener disulfidverbrückter Conotoxinisomere in Abhängigkeit von ihrer primären Aminosäuresequenz. Parallel zu den biologischen Testungen der hergestellten Peptide wurden computerbasierte Untersuchungen durchgeführt um ihre Struktur-Aktivitätsbeziehungen zu analysieren. Die vorgestellte Dissertation erweitert das Wissen über Conotoxine und die Bildung von Disulfidbrücken mit ihren Auswirkungen auf die biologische Aktivität und demonstriert darüber hinaus die schwierige Strukturaufklärung und Charakterisierung

    The analysis of onion and garlic

    Get PDF
    Onion (Allium cepa L.) and garlic (Allium sativum L.), among the oldest cultivated plants, are used both as a food and for medicinal applications. In fact, these common food plants are a rich source of several phytonutrients recognized as important elements of the Mediterranean diet, but are also used in the treatment and prevention of a number of diseases, including cancer, coronary heart disease, obesity, hypercholesterolemia, diabetes type 2, hypertension, cataract and disturbances of the gastrointestinal tract (e.g. colic pain, flatulent colic and dyspepsia). These activities are related to the thiosulfinates, volatile sulfur compounds, which are also responsible for the pungent of these vegetables. Besides these low-molecular weight compounds, onion and garlic are characterized by more polar compounds of phenolic and steroidal origin, often glycosilated, showing interesting pharmacological properties. These latter compounds, compared to the more studied thiosulfinates, present the advantages to be not pungent and more stable to cooking. Recently, there has been an increasing scientific attention on such compounds. In this paper, the literature about the major volatile and non-volatile phytoconstituents of onion and garlic has been reviewed. Particular attention was given to the different methodology developed to perform chemical analysis, including separation and structural elucidation. (c) 2005 Elsevier B.V. All rights reserved

    Protein structure prediction and modelling

    Get PDF
    The prediction of protein structures from their amino acid sequence alone is a very challenging problem. Using the variety of methods available, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis uses many of the widely available methods for the prediction and modelling of protein structures and proposes some new ideas for aiding the process. A new method for measuring the buriedness (or exposure) of residues is discussed which may lead to a potential way of assessing proteins' individual amino acid placement and whether they have a standard profile. This may become useful in assessing predicted models. Threading analysis and modelling of structures for the Critical Assessment of Techniques for Protein Structure Prediction (CASP2) highlights inaccuracies in the current state of protein prediction, particularly with the alignment predictions of sequence on structure. An in depth analysis of the placement of gaps within a multiple sequence threading method is discussed, with ideas for the improvement of threading predictions by the construction of an improved gap penalty. A threading based homology model was constructed with an RMSD of 6.2A, showing how combinations of methods can give usable results. Using a distance geometry method, DRAGON, the ab initio prediction of a protein (NK Lysin) for the CASP2 assessment was achieved with an accuracy of 4.6Ă…. This highlighted several ideas in disulphide prediction and a novel method for predicting which cysteine residues might form disulphide bonds in proteins. Using a combination of all the methods, with some like threading and homology modelling proving inadequate, an ab initio model of the N-terminal domain of a GPCR was built based on secondary structure and predictions of disulphide bonds. Use of multiple sequences in comparing sequences to structures in threading should give enough information to enable the improvements required before threading can be-come a major way of building homology models. Furthermore, with the ability to predict disulphide bonds: restraints can be placed when building models, ab initio or otherwise
    • …
    corecore