152 research outputs found

    Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A protein binding hot spot is a small cluster of residues tightly packed at the center of the interface between two interacting proteins. Though a hot spot constitutes a small fraction of the interface, it is vital to the stability of protein complexes. Recently, there are a series of hypotheses proposed to characterize binding hot spots, including the pioneering O-ring theory, the insightful 'coupling' and 'hot region' principle, and our 'double water exclusion' (DWE) hypothesis. As the perspective changes from the O-ring theory to the DWE hypothesis, we examine the physicochemical properties of the binding hot spots under the new hypothesis and compare with those under the O-ring theory.</p> <p>Results</p> <p>The requirements for a cluster of residues to form a hot spot under the DWE hypothesis can be mathematically satisfied by a biclique subgraph if a vertex is used to represent a residue, an edge to indicate a close distance between two residues, and a bipartite graph to represent a pair of interacting proteins. We term these hot spots as DWE bicliques. We identified DWE bicliques from crystal packing contacts, obligate and non-obligate interactions. Our comparative study revealed that there are abundant <it>unique </it>bicliques to the biological interactions, indicating specific biological binding behaviors in contrast to crystal packing. The two sub-types of biological interactions also have their own signature bicliques. In our analysis on residue compositions and residue pairing preferences in DWE bicliques, the focus was on interaction-preferred residues (ipRs) and interaction-preferred residue pairs (ipRPs). It is observed that hydrophobic residues are heavily involved in the ipRs and ipRPs of the obligate interactions; and that aromatic residues are in favor in the ipRs and ipRPs of the biological interactions, especially in those of the non-obligate interactions. In contrast, the ipRs and ipRPs in crystal packing are dominated by hydrophilic residues, and most of the anti-ipRs of crystal packing are the ipRs of the obligate or non-obligate interactions.</p> <p>Conclusions</p> <p>These ipRs and ipRPs in our DWE bicliques describe a diverse binding features among the three types of interactions. They also highlight the specific binding behaviors of the biological interactions, sharply differing from the artifact interfaces in the crystal packing. It can be noted that DWE bicliques, especially the unique bicliques, can capture deep insights into the binding characteristics of protein interfaces.</p

    Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling

    Get PDF
    The Joint Evolutionary Trees (JET) method detects protein interfaces, the core residues involved in the folding process, and residues susceptible to site-directed mutagenesis and relevant to molecular recognition. The approach, based on the Evolutionary Trace (ET) method, introduces a novel way to treat evolutionary information. Families of homologous sequences are analyzed through a Gibbs-like sampling of distance trees to reduce effects of erroneous multiple alignment and impacts of weakly homologous sequences on distance tree construction. The sampling method makes sequence analysis more sensitive to functional and structural importance of individual residues by avoiding effects of the overrepresentation of highly homologous sequences and improves computational efficiency. A carefully designed clustering method is parametrized on the target structure to detect and extend patches on protein surfaces into predicted interaction sites. Clustering takes into account residues' physical-chemical properties as well as conservation. Large-scale application of JET requires the system to be adjustable for different datasets and to guarantee predictions even if the signal is low. Flexibility was achieved by a careful treatment of the number of retrieved sequences, the amino acid distance between sequences, and the selective thresholds for cluster identification. An iterative version of JET (iJET) that guarantees finding the most likely interface residues is proposed as the appropriate tool for large-scale predictions. Tests are carried out on the Huang database of 62 heterodimer, homodimer, and transient complexes and on 265 interfaces belonging to signal transduction proteins, enzymes, inhibitors, antibodies, antigens, and others. A specific set of proteins chosen for their special functional and structural properties illustrate JET behavior on a large variety of interactions covering proteins, ligands, DNA, and RNA. JET is compared at a large scale to ET and to Consurf, Rate4Site, siteFiNDER|3D, and SCORECONS on specific structures. A significant improvement in performance and computational efficiency is shown

    Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

    Get PDF
    BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

    Associations between diet and disease activity in ulcerative colitis patients using a novel method of data analysis

    Get PDF
    BACKGROUND: The relapsing nature and varying geographical prevalence of ulcerative colitis (UC) implicates environmental factors such as diet in its aetiology. METHODS: In order to determine which foods might be related to disease activity in UC a new method of dietary analysis was developed and applied. Eighty-one UC patients were recruited at all stages of the disease process. Following completion of a 7 d diet diary, clinical assessment including a sigmoidoscopic examination (scale 0 (normal mucosa) to 6 (very active disease)) was conducted. Food weights for each person were adjusted (divided) by the person's calorific intake for the week. Each food consumed was given a food sigmoidoscopy score (FSS) calculated by summing the products of the (adjusted) weight of food consumed and sigmoidoscopy score for each patient and occurrence of food and dividing by the total (adjusted) weight of the food consumed by all 81 patients. Thus, foods eaten in large quantities by patients with very active disease have high FSSs and vice versa. Foods consumed by <10 people or weighing <1 kg for the whole group were excluded, leaving 75 foods. RESULTS: High FSS foods were characterized by high levels of the anti-thiamin additive sulfite (Mann-Whitney, p < 0.001), i.e. bitter, white wine, burgers, soft drinks from concentrates, sausages, lager and red wine. Caffeine also has anti-thiamin properties and decaffeinated coffee was associated with a better clinical state than the caffeine containing version. Beneficial foods (average intake per week) included pork (210 g), breakfast cereals (200 g), lettuce (110 g), apples and pears (390 g), milk (1250 ml), melon (350 g), bananas (350 g), bacon (120 g), beef and beef products (500 g), tomatoes (240 g), soup (700 g), citrus fruits (300 g), fish (290 g), yogurt (410 g), cheese (110 g), potatoes (710 g) and legumes (120 g). CONCLUSIONS: The dietary analysis method described provides a new tool for establishing relationships between diet and disease and indicates a potentially therapeutic diet for UC

    Of embodied emissions and inequality: rethinking energy consumption

    Get PDF
    This paper situates concepts of energy consumption within the context of growing research on embodied emissions. Using the UK as a case study I unpack the global socio-economic and ecological inequalities inherent in the measurement of greenhouse gas emissions on a territorial basis under the international climate change framework. In so doing, I problematise questions of distribution, allocation and responsibility with regards to the pressing need to reduce global GHG emissions and the consumption that generates them. I challenge the disproportionate emphasis that energy policy places on supply as opposed to demand, as well as its overriding focus on the national scale. Consequently I argue that any low carbon transition, in addition to a technological process, is also a geographical one that will involve the reconfiguration of "current spatial patterns of economic and social activity" (Bridge et al., 2013:331), as well as relationships both within countries and regions and between them

    Combining specificity determining and conserved residues improves functional site prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities.</p> <p>Results</p> <p>Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples.</p> <p>Conclusion</p> <p>The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.</p

    The Korea discount and chaebols

    Get PDF
    Finance practitioners frequently claim that stocks of Korean firms are undervalued and trade at a discount relative to foreign firms. This phenomenon is commonly called «the Korea discount». It is based on anecdotal evidence comparing either the price- earnings ratios of different market indexes or those of different individual stocks. This paper provides empirical evidence on the existence of such a discount using a large sample of stocks from 28 countries over the period 2002-2016. We find that Korean stocks have significantly lower price-earnings ratios than their global peers. We also investigate the role of large business groups called chaebols, which are often considered to be the main cause of the discount because of their poor corporate governance. Our findings show that it is not the case

    Assessment of protein-protein interfaces in cryo-EM derived assemblies

    Get PDF
    Structures of macromolecular assemblies derived from cryo-EM maps often contain errors that become more abundant with decreasing resolution. Despite efforts in the cryo-EM community to develop metrics for map and atomistic model validation, thus far, no specific scoring metrics have been applied systematically to assess the interface between the assembly subunits. Here, we comprehensively assessed protein–protein interfaces in macromolecular assemblies derived by cryo-EM. To this end, we developed Protein Interface-score (PI-score), a density-independent machine learning-based metric, trained using the features of protein–protein interfaces in crystal structures. We evaluated 5873 interfaces in 1053 PDB-deposited cryo-EM models (including SARS-CoV-2 complexes), as well as the models submitted to CASP13 cryo-EM targets and the EM model challenge. We further inspected the interfaces associated with low-scores and found that some of those, especially in intermediate-to-low resolution (worse than 4 Å) structures, were not captured by density-based assessment scores. A combined score incorporating PI-score and fit-to-density score showed discriminatory power, allowing our method to provide a powerful complementary assessment tool for the ever-increasing number of complexes solved by cryo-EM
    corecore