15 research outputs found

    Pairwise covariance adds little to secondary structure prediction but improves the prediction of non-canonical local structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Amino acid sequence probability distributions, or profiles, have been used successfully to predict secondary structure and local structure in proteins. Profile models assume the statistical independence of each position in the sequence, but the energetics of protein folding is better captured in a scoring function that is based on pairwise interactions, like a force field.</p> <p>Results</p> <p>I-sites motifs are short sequence/structure motifs that populate the protein structure database due to energy-driven convergent evolution. Here we show that a pairwise covariant sequence model does not predict alpha helix or beta strand significantly better overall than a profile-based model, but it does improve the prediction of certain loop motifs. The finding is best explained by considering secondary structure profiles as multivariant, all-or-none models, which subsume covariant models. Pairwise covariance is nonetheless present and energetically rational. Examples of negative design are present, where the covariances disfavor non-native structures.</p> <p>Conclusion</p> <p>Measured pairwise covariances are shown to be statistically robust in cross-validation tests, as long as the amino acid alphabet is reduced to nine classes. An updated I-sites local structure motif library that provides sequence covariance information for all types of local structure in globular proteins and a web server for local structure prediction are available at <url>http://www.bioinfo.rpi.edu/bystrc/hmmstr/server.php</url>.</p

    pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model

    Get PDF
    BACKGROUND: Protein kinase A (cAMP-dependent kinase, PKA) is a serine/threonine kinase, for which ca. 150 substrate proteins are known. Based on a refinement of the recognition motif using the available experimental data, we wished to apply the simplified substrate protein binding model for accurate prediction of PKA phosphorylation sites, an approach that was previously successful for the prediction of lipid posttranslational modifications and of the PTS1 peroxisomal translocation signal. RESULTS: Approximately 20 sequence positions flanking the phosphorylated residue on both sides have been found to be restricted in their sequence variability (region -18...+23 with the site at position 0). The conserved physical pattern can be rationalized in terms of a qualitative binding model with the catalytic cleft of the protein kinase A. Positions -6...+4 surrounding the phosphorylation site are influenced by direct interaction with the kinase in a varying degree. This sequence stretch is embedded in an intrinsically disordered region composed preferentially of hydrophilic residues with flexible backbone and small side chain. This knowledge has been incorporated into a simplified analytical model of productive binding of substrate proteins with PKA. CONCLUSION: The scoring function of the pkaPS predictor can confidently discriminate PKA phosphorylation sites from serines/threonines with non-permissive sequence environments (sensitivity of ~96% at a specificity of ~94%). The tool "pkaPS" has been applied on the whole human proteome. Among new predicted PKA targets, there are entirely uncharacterized protein groups as well as apparently well-known families such as those of the ribosomal proteins L21e, L22 and L6. AVAILABILITY: The supplementary data as well as the prediction tool as WWW server are available at . REVIEWERS: Erik van Nimwegen (Biozentrum, University of Basel, Switzerland), Sandor Pongor (International Centre for Genetic Engineering and Biotechnology, Trieste, Italy), Igor Zhulin (University of Tennessee, Oak Ridge National Laboratory, USA)

    Statistical Methods for Conservation and Alignment Quality in Proteins

    Get PDF
    Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.Siirretty Doriast

    Pattern discovery in sequence databases : algorithms and applications to DNA/protein classification

    Get PDF
    Sequence databases comprise sequence data, which are linear structural descriptions of many natural entities. Approximate pattern discovery in a sequence database can lead to important conclusions or prediction of new phenomena. Traditional database technology is not suitable for accomplishing the task, and new techniques need to be developed. In this dissertation, we propose several new techniques for discovering patterns in sequence databases. Our techniques incorporate pattern matching algorithms and novel heuristics for discovery and optimization. Experimental results of applying the techniques to both generated data and DNA/proteins show the effectiveness of the proposed techniques. We then develop several classifiers using our pattern discovery algorithms and a previously published fingerprint technique. When we apply the classifiers to classify DNA and protein sequences, they give information that is complementary to the best classifiers available today

    Structure-Function Relationship of the Na+-Glucose Cotransporter SGLT1: Interaction of Loop 13 with Various Inhibitors and Lipid Membranes

    Get PDF
    SGLT1, ein in der Plasmamembrane lokalisierter Natrium/Glukose-Cotransporter, wird stark durch aromatische und aliphatische Glukoside wie Phlorizin und n-Hexylglukosid gehemmt. Wir exprimierten eine Domäne (loop 13), von der vermutet wird, dass sie an der Bindung des Inhibitors beteiligt ist, und markierten sie mit Tryptophan-Reportergruppen. Wildtyp und sechs Mutanten (Q581W, E591W, R601W, D611W, E621W und L630W) von loop 13 (Aminosäurereste 564-638) wurden in Escherichia coli exprimiert und zur Homogenität aufgereinigt. Änderungen in der Intensität der Trp-Fluoreszenz und der Position der Emissionmaxima wurden gemessen, um die Konformationsveränderungen während der Bindung des Hemmstoffs zu untersuchen. Nach der Zugabe von Phlorizin zeigte die Mutante D611W mit 80% den größten Quenching-Effekt, gefolgt von R601W (67%). D611W zeigte bei der Trp-Fluoreszenz auch die größte Verschiebung in den Rotbereich (~14 nm), was darauf hinweist, dass diese Region einer hydrophileren Umgebung ausgesetzt wurde. Titrierungexperimente ergaben für alle Mutanten eine ähnliche Affinität für Phlorizin. Auch die größte Änderung der collisional quenching constant von Acrylamid wurde bei D611W beobachtet (KSV = 11 M-1 in Abwesenheit von Phlorizin und 55 M-1 Anwesenheit). Ähnliche Resultate wurden mit Phloretin erreicht. CD-Messungen und Computermodellierungen ergaben, dass sich D611W in einer random coil zwischen zwei a-helikalen Bereichen befindet. Zusammen genommen, deuten diese Daten darauf hin, dass die Bindung von Phlorizin Konformationsänderungen bewirkt, die zu einer `Öffnung` von loop 13 führen. Modellierungen lassen eine Interaktion der 4- und 6-OH-Gruppen des aromatischen Rings A von Phlorizin mit der Region zwischen Aminosäure 606 und 611 vermuten. Phloretin scheint mit der gleichen Region des Peptids zu interagieren. Kein quenching oder Schutzeffekt wurde für D-Glukose gefunden. Alkylglukoside sind auch kompetetive Inhibitoren des Transports. Folglich suchten wir nach möglichen Bindungsstellen für Alkylglukoside in loop 13. Zu diesem Zweck benutzten wir einen Photoaffinitätsmarker (2 -Azi-n-octyl)-b-D-Glukosid und analysierten die Bindungsregion mit MALDI-Massenspektrometrie an rekombinant-hergestelltem Wildtyp-loop 13. Ausserdem wurden Experimente an den einzelnen Trp-Mutanten von loop 13 (Q581W, E591W, R601W, D611W, E621W und L630W) durchgeführt, um die Änderungen der Fluoreszenz und die Zugänglichkeit jeder Mutante für Acrylamid in Anwesenheit der Alkylglukoside zu bestimmen. Photolabeling von Loop 13 mit (2 -Azi-n-octyl)-b-D-Glukosid zeigte eine Verbindung der C2-Gruppe der Alkylkette mit Gly-Phe-Phe-Arg (Aminosäurereste 598-601) auf. In Gegenwart von n-Hexyl-b-D-Glukosid zeigten die Mutanten (R601W, D611W, E621W und L630W) eine signifikante Abnahme der Trp-Fluoreszenz mit einer Bindungs-Affinität von 8-14 µM. Nur L630W wies, eine signifikante Verschiebung in den blauen Bereich auf, und nur bei R601W konnte eine Änderung beim Acrylamid-quenching (Abschirmung) beobachtet werden. 1-Hexanol produzierte die gleichen Resultate wie n-Hexyl-b-D-Glukosid. Die Interaktion zeigt eine Stereo-Selektivität für die Bindung von n-Hexyl-b-D-Glukosid; die b-Konfiguration des Zuckeranteils an C1, die cis-Konformation der ungesättigten Alkenyl-Seitenkette an C3-C4 und eine Länge von sechs bis acht der Kohlenstoffatomen bei Alkylkette führen zu einer optimalen Interaktion. Daraus wurde ein schematisches zweidimensionales Modell abgeleitet, in dem C2 der Alkylkette mit der Region um Rest 601, C3 und C4 mit der Region zwischen Rest 614 und 619 und C6-C8 mit der Region zwischen den Resten 621 und 630 interagieren. Die Daten zeigen, daß loop 13 eine Bindestelle sowohl für Alkylglukoside als auch für Phlorizin darstellt, allerdings unterscheiden sich die Bereiche des Peptids, die beeinflußt werden, und die vermutete Konformation beträchtlich. Loop 13 von SGLT1 scheint also eine Hauptbindungsstelle für die Aglucon-Reste des kompetetiven D-Glukose Inhibitors zu sein. Desweiteren wurde die Interaktion von Loop 13 mit Lipid-Vesikeln durch Rekonstitutions-Experimente aller Trp-Mutanten von Loop 13 untersucht. Eine Zunahme der Trp-Fluoreszenz, eine Blau-Verschiebung der Maxima und Abschirmung gegen Acrylamid wurden bis zu einem gewissen Grad für jede Mutante beobachtet, außer für R601W, die keine Veränderungen zeigte. Die Mutanten Q581W, E621W und L630W wiesen eine starke Blau-Verschiebung auf, was auf eine Position in der Lipid-Doppelschicht hindeutet. Durch Verwendung von in die Membran eingebetteten doxyl spin labels an der C5 und C12-Position der Akyl-Seitenkette konnte diese Lokalisation anhand des Quenching dieser Tryptophane bestätigt werden. Die Abschirmung von Q581W, E621W und L630W gegen Acrylamid stützte die Annahme, dass sich diese Teile von loop 13 in der Lipid-Doppelschicht der Vesikel befinden. Die anderen Teile des Peptids scheinen außerhalb der Membran zu bleiben; die vier Lysin-Reste von Position 594 bis 597 bilden eine Kontaktstelle für die negativ geladenen Phospholipide. Da Änderungen der Fluoreszenz nur in Vesikeln beobachtet wurden, die Phosphatidylglycerol (PG) enthalten, ist diese Anheftung eine Voraussetzung für den Einbau der Peptidstücke in die Membranen. Diese Untersuchungen lassen folglich innerhalb einer Domäne zwei verschiedene Bindestellen für Inhibitoren des Natrium-D-Glukose-Cotransporters (SGLT1) erkennen, die als Ziele für die Entwicklung möglicher Therapeutika dienen könnten, um die Aufnahme von Zucker in Darm und Niere zu steuern. Die Topologiestudien bestätigen, dass die Bindestelle für Phlorizin zwischen den Aminosäureresten 601 und 611, und die Bindestelle für Alkylglukoside zwischen den Aminosäureresten 601 bis 619 außerhalb der Lipid-Doppelschicht liegen, weshalb sie an den Vesikeln für den Inhibitor frei zugänglich sind. Für Alkylglukoside scheinen noch weitere Interaktionen innerhalb der Lipid-Doppelschicht aufzutreten.SGLT1, a plasma membrane sodium/glucose cotransporter, is strongly inhibited by aromatic and aliphatic glucosides such as phlorizin and n-hexyl glucoside. We expressed a domain (loop 13), postulated to be involved in inhibitor binding, and labeled it with tryptophan reporter groups. Wild type and six mutants (Q581W, E591W, R601W, D611W, E621W and L630W) of loop 13 (amino acid residues 564-638) were expressed in Escherichia coli and purified to homogeneity. Changes in Trp quenching and positions of the emission maxima were determined in order to determine the conformational changes occurring during inhibitor binding. After addition of phlorizin D611W displayed the largest quenching of 80%, followed by R601W (67%). It also exhibited the maximum red shift in Trp fluorescence (~14 nm), indicating an exposure of this region to a more hydrophilic environment. Titration experiments performed for each mutant showed a similar phlorizin affinity for all mutants. Also the maximum change in the collisional quenching constant by acrylamide was noted for D611W (KSV = 11 M-1 in the absence of phlorizin and 55 M-1 in its presence). Similar results were obtained with phloretin. CD measurements and computer modeling revealed that D611W is positioned in a random coil situated between two α-helical segments. Taken together, these data indicate that phlorizin binding elicits changes in conformation leading to an opening of loop 13. Modeling suggests an interaction of the 4- and 6-OH of the aromatic ring A of phlorizin with the region between amino acids 606 and 611. Phloretin seems to interact with the same region of the peptide. No quenching or protection was found for D-glucose.Alkyl glucosides are also competitive inhibitors of the transport. Therefore, we searched for potential binding sites for alkyl glucosides in loop 13. To this end, we used a photoaffinity label (2′-Azi-n-octyl)-b-D-glucoside and analysed the region of attachment using MALDI-mass spectrometry producing wild type recombinant loop 13. Furthermore, experiments were conducted on single Trp mutants of loop 13 (Q581W, E591W, R601W, D611W, E621W and L630W) to determine the changes in fluorescence in the presence of alkyl glucosides, and accessibility of each Trp mutant to acrylamide. Photolabeling of loop 13 with (2′-Azi-n-octyl)-b-D-glucoside revealed an attachment of the C2 group of the alkyl chain to Gly-Phe-Phe-Arg (amino acid residues 598-601). In the presence of n-hexyl-b-D-glucoside the mutants (R601W, D611W, E621W and L630W) showed a significant decrease in Trp fluorescence with an apparent binding affinity of 8-14 µM. Only L630W showed a significant blue shift and only in R601W was a change in acrylamide quenching (protection) observed. 1-hexanol produced the same results as n-hexyl-b-D-glucoside. The interaction shows stereo-selectivity for n-hexyl-b-D-glucoside binding; the b-configuration of the sugar moiety at C1, the cis-conformation of unsaturated alkenyl side chain at C3-C4, and the alkyl chain length of six to eight carbon atoms lead to an optimum interaction. A schematic two-dimensional model was derived in which C2 of the alkyl chain interacts with the region around residue 601, C3 and C4 interact with the region between residue 614 and 619, and C6-C8 interact with the region between residues 621 and 630. The data demonstrate that loop 13 provides binding sites for alkyl glucosides as well as for phlorizin; but the regions of the peptide affected and the conformation assumed differ markedly. Thus, loop 13 of SGLT1 seems to be a major binding domain for the aglucone residues of competitive D-glucose transport inhibitors. Furthermore, the interaction of loop 13 with lipid vesicles was monitored by reconstitution experiments of all Trp mutants of loop 13. An increase in Trp fluorescence, blue shifts in maxima and protection against acrylamide to a certain extent were observed for each mutant, except for R601W, which exhibited no change in these parameters. Mutants Q581W, E621W, and L630W exhibited a strong blue shift, suggesting their position in the lipid bilayer. By using membrane-embedded doxyl spin labels at 5- and 12-position of acyl side chain this position was confirmed by the quenching of these tryptophans. The protection of Q581W, E621W, and L630W against acrylamide supported the assumption that these parts of loop 13 are inserted into lipid vesicles. The other parts of the peptide appear to remain outside the lipid bilayer; the four lysine residues from position 594 to 597 forming an attachment site to negatively charged phospholipids. This attachment is a prerequisite for the insertion of parts of the peptide into the membranes since fluorescence changes were only observed in lipid vesicles containing phosphatidylglycerol (PG). These studies thus reveal within one domain two different binding sites for inhibitors of the sodium-D-glucose cotransporter (SGLT1) which might act as targets for the development of potential therapeutics to control sugar uptake in the intestine and kidney.The topology studies confirm that the phlorizin binding domain between amino acid residues 601 to 611 and the alkyl glucoside binding domain between amino acid residues 601 to 619 reside outside the lipid bilayer and in vesicles is freely accessible to the inhibitor. For the alkyl glucosides additional interactions with the lipid bilayers appear to occur

    Statistical methods for conservation and alignment quality in proteins

    Get PDF
    v2008okDiss.: Turun yliopisto, 200

    Evolution-aware Protein Structure Comparison and Applications in Protein-Protein Interaction Prediction

    Get PDF
    Comparison of protein structures provide insights into the function and interactions of proteins and enhance our understanding of biomolecular mechanisms driving life and disease. Available protein structure comparison methods are based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins. However, non-geometric features contained in primary sequence and evolutionary history of proteins contain valuable information that can enhance detection of such similarities. In this study, we introduced a new method to incorporate additional biochemical and evolutionary features of the proteins being compared. We proposed UniScore as a new structure similarity score, which integrates geometric similarity, sequence similarity, and evolutionary profiles of the proteins. We further developed a corresponding Unialign algorithm for finding structural alignment of proteins with near-optimal UniScore. We evaluated Unialign in terms of the consistency between the alignments it produces with human-curated alignments, calculated by the fraction of correctly aligned residues. Experimental results show that UniAlign outperforms other structural programs in aligning proteins from the NCBI's human-curated Conserved Domain Database. Unialign's ability in detecting functionally important structural similarities is utilized in an application to discover interactions between HIV-1 ENV protein (gp41 and gp120) and human proteins. Structural compatibility of an HIV-human interaction pairs are evaluated via geometric, biochemical, and evolutionary features and a prediction model is developed using a Support Vector Machine. This provides the first model for prediction of interactions that can also generate a protein-protein 3D complex. The results of the HIV-human interaction study have discovered novel virus-host interactions as well as potential clinical targets for therapeutic intervention.Ph.D., Biomedical Engineering -- Drexel University, 201
    corecore