18 research outputs found

    Predicting mostly disordered proteins by using structure-unknown protein data

    Get PDF
    BACKGROUND: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. RESULTS: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. CONCLUSION: The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness

    A framework for evolutionary systems biology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many difficult problems in evolutionary genomics are related to mutations that have weak effects on fitness, as the consequences of mutations with large effects are often simple to predict. Current systems biology has accumulated much data on mutations with large effects and can predict the properties of knockout mutants in some systems. However experimental methods are too insensitive to observe small effects.</p> <p>Results</p> <p>Here I propose a novel framework that brings together evolutionary theory and current systems biology approaches in order to quantify small effects of mutations and their epistatic interactions <it>in silico</it>. Central to this approach is the definition of fitness correlates that can be computed in some current systems biology models employing the rigorous algorithms that are at the core of much work in computational systems biology. The framework exploits synergies between the realism of such models and the need to understand real systems in evolutionary theory. This framework can address many longstanding topics in evolutionary biology by defining various 'levels' of the adaptive landscape. Addressed topics include the distribution of mutational effects on fitness, as well as the nature of advantageous mutations, epistasis and robustness. Combining corresponding parameter estimates with population genetics models raises the possibility of testing evolutionary hypotheses at a new level of realism.</p> <p>Conclusion</p> <p>EvoSysBio is expected to lead to a more detailed understanding of the fundamental principles of life by combining knowledge about well-known biological systems from several disciplines. This will benefit both evolutionary theory and current systems biology. Understanding robustness by analysing distributions of mutational effects and epistasis is pivotal for drug design, cancer research, responsible genetic engineering in synthetic biology and many other practical applications.</p

    A global experiment on motivating social distancing during the COVID-19 pandemic

    Get PDF
    Significance Communicating in ways that motivate engagement in social distancing remains a critical global public health priority during the COVID-19 pandemic. This study tested motivational qualities of messages about social distancing (those that promoted choice and agency vs. those that were forceful and shaming) in 25,718 people in 89 countries. The autonomy-supportive message decreased feelings of defying social distancing recommendations relative to the controlling message, and the controlling message increased controlled motivation, a less effective form of motivation, relative to no message. Message type did not impact intentions to socially distance, but people’s existing motivations were related to intentions. Findings were generalizable across a geographically diverse sample and may inform public health communication strategies in this and future global health emergencies. Abstract Finding communication strategies that effectively motivate social distancing continues to be a global public health priority during the COVID-19 pandemic. This cross-country, preregistered experiment (n = 25,718 from 89 countries) tested hypotheses concerning generalizable positive and negative outcomes of social distancing messages that promoted personal agency and reflective choices (i.e., an autonomy-supportive message) or were restrictive and shaming (i.e., a controlling message) compared with no message at all. Results partially supported experimental hypotheses in that the controlling message increased controlled motivation (a poorly internalized form of motivation relying on shame, guilt, and fear of social consequences) relative to no message. On the other hand, the autonomy-supportive message lowered feelings of defiance compared with the controlling message, but the controlling message did not differ from receiving no message at all. Unexpectedly, messages did not influence autonomous motivation (a highly internalized form of motivation relying on one’s core values) or behavioral intentions. Results supported hypothesized associations between people’s existing autonomous and controlled motivations and self-reported behavioral intentions to engage in social distancing. Controlled motivation was associated with more defiance and less long-term behavioral intention to engage in social distancing, whereas autonomous motivation was associated with less defiance and more short- and long-term intentions to social distance. Overall, this work highlights the potential harm of using shaming and pressuring language in public health communication, with implications for the current and future global health challenges

    Predictors of natively unfolded proteins: unanimous consensus score to detect a twilight zone between order and disorder in generic datasets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding.</p> <p>Results</p> <p>In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here <it>gVSL2</it>, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, <it>gVSL2 </it>and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score <it>S</it><sub><it>SU</it></sub>, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins.</p> <p>Conclusions</p> <p>Our results show that proteins unclassified by <it>S</it><sub><it>SU </it></sub>belong to a twilight zone. Proteins left unclassified by the consensus score <it>S</it><sub><it>SU </it></sub>have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.</p
    corecore