76 research outputs found

    Protein-segment universe exhibiting transitions at intermediate segment length in conformational subspaces

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many studies have examined rules governing two aspects of protein structures: short segments and proteins' structural domains. Nevertheless, the organization and nature of the conformational space of segments with intermediate length between short segments and domains remain unclear. Conformational spaces of intermediate length segments probably differ from those of short segments. We investigated the identification and characterization of the boundary(s) between peptide-like (short segment) and protein-like (long segment) distributions. We generated ensembles embedded in globular proteins comprising segments 10–50 residues long. We explored the relationships between the conformational distribution of segments and their lengths, and also protein structural classes using principal component analysis based on the intra-segment <b>C</b><sub>α</sub>-<b>C</b><sub>α </sub>atomic distances.</p> <p>Results</p> <p>Our statistical analyses of segment conformations and length revealed critical dual transitions in their conformational distribution with segments derived from all four structural classes. Dual transitions were identified with the intermediate phase between the short segments and domains. Consequently, protein segment universes were categorized. i) Short segments (10–22 residues) showed a distribution with a high frequency of secondary structure clusters. ii) Medium segments (23���26 residues) showed a distribution corresponding to an intermediate state of transitions. iii) Long segments (27–50 residues) showed a distribution converging on one huge cluster containing compact conformations with a smaller radius of gyration. This distribution reflects the protein structures' organization and protein domains' origin. Three major conformational components (radius of gyration, structural symmetry with respect to the N-terminal and C-terminal halves, and single-turn/two-turn structure) well define most of the segment universes. Furthermore, we identified several conformational components that were unique to each structural class. Those characteristics suggest that protein segment conformation is described by compositions of the three common structural variables with large contributions and specific structural variables with small contributions.</p> <p>Conclusion</p> <p>The present results of the analyses of four protein structural classes show the universal role of three major components as segment conformational descriptors. The obtained perspectives of distribution changes related to the segment lengths using the three key components suggest both the adequacy and the possibility of further progress on the prediction strategies used in the recent <it>de novo </it>structure-prediction methods.</p

    Universal partitioning of the hierarchical fold network of 50-residue segments in proteins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (<it>K</it><sub>c</sub>) of clusters. We examined various <it>K</it><sub>c </sub>values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing <it>K</it><sub>c</sub>. Furthermore, we constructed networks by linking structurally similar clusters.</p> <p>Results</p> <p>The network was partitioned persistently into four regions for <it>K</it><sub>c </sub>≥ 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For <it>K</it><sub>c </sub>≥ 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding.</p> <p>Conclusion</p> <p>Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks).</p

    Identification of hepta-histidine as a candidate drug for Huntington's disease by in silico-in vitro- in vivo-integrated screens of chemical libraries.

    Get PDF
    We identified drug seeds for treating Huntington's disease (HD) by combining in vitro single molecule fluorescence spectroscopy, in silico molecular docking simulations, and in vivo fly and mouse HD models to screen for inhibitors of abnormal interactions between mutant Htt and physiological Ku70, an essential DNA damage repair protein in neurons whose function is known to be impaired by mutant Htt. From 19,468 and 3,010,321 chemicals in actual and virtual libraries, fifty-six chemicals were selected from combined in vitro-in silico screens; six of these were further confirmed to have an in vivo effect on lifespan in a fly HD model, and two chemicals exerted an in vivo effect on the lifespan, body weight and motor function in a mouse HD model. Two oligopeptides, hepta-histidine (7H) and Angiotensin III, rescued the morphological abnormalities of primary neurons differentiated from iPS cells of human HD patients. For these selected drug seeds, we proposed a possible common structure. Unexpectedly, the selected chemicals enhanced rather than inhibited Htt aggregation, as indicated by dynamic light scattering analysis. Taken together, these integrated screens revealed a new pathway for the molecular targeted therapy of HD

    SAHG, a comprehensive database of predicted structures of all human proteins

    Get PDF
    Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships

    Predicting global distributions of eukaryotic plankton communities from satellite data

    Get PDF
    プランクトンを宇宙から観測する --衛星データを入力データとする海洋真核微生物群集予測モデルの開発--. 京都大学プレスリリース. 2023-10-19.Satellite remote sensing is a powerful tool to monitor the global dynamics of marine plankton. Previous research has focused on developing models to predict the size or taxonomic groups of phytoplankton. Here, we present an approach to identify community types from a global plankton network that includes phytoplankton and heterotrophic protists and to predict their biogeography using global satellite observations. Six plankton community types were identified from a co-occurrence network inferred using a novel rDNA 18 S V4 planetary-scale eukaryotic metabarcoding dataset. Machine learning techniques were then applied to construct a model that predicted these community types from satellite data. The model showed an overall 67% accuracy in the prediction of the community types. The prediction using 17 satellite-derived parameters showed better performance than that using only temperature and/or the concentration of chlorophyll a. The constructed model predicted the global spatiotemporal distribution of community types over 19 years. The predicted distributions exhibited strong seasonal changes in community types in the subarctic–subtropical boundary regions, which were consistent with previous field observations. The model also identified the long-term trends in the distribution of community types, which suggested responses to ocean warming

    Predicting mostly disordered proteins by using structure-unknown protein data

    Get PDF
    BACKGROUND: Predicting intrinsically disordered proteins is important in structural biology because they are thought to carry out various cellular functions even though they have no stable three-dimensional structure. We know the structures of far more ordered proteins than disordered proteins. The structural distribution of proteins in nature can therefore be inferred to differ from that of proteins whose structures have been determined experimentally. We know many more protein sequences than we do protein structures, and many of the known sequences can be expected to be those of disordered proteins. Thus it would be efficient to use the information of structure-unknown proteins in order to avoid training data sparseness. We propose a novel method for predicting which proteins are mostly disordered by using spectral graph transducer and training with a huge amount of structure-unknown sequences as well as structure-known sequences. RESULTS: When the proposed method was evaluated on data that included 82 disordered proteins and 526 ordered proteins, its sensitivity was 0.723 and its specificity was 0.977. It resulted in a Matthews correlation coefficient 0.202 points higher than that obtained using FoldIndex, 0.221 points higher than that obtained using the method based on plotting hydrophobicity against the number of contacts and 0.07 points higher than that obtained using support vector machines (SVMs). To examine robustness against training data sparseness, we investigated the correlation between two results obtained when the method was trained on different datasets and tested on the same dataset. The correlation coefficient for the proposed method is 0.14 higher than that for the method using SVMs. When the proposed SGT-based method was compared with four per-residue predictors (VL3, GlobPlot, DISOPRED2 and IUPred (long)), its sensitivity was 0.834 for disordered proteins, which is 0.052–0.523 higher than that of the per-residue predictors, and its specificity was 0.991 for ordered proteins, which is 0.036–0.153 higher than that of the per-residue predictors. The proposed method was also evaluated on data that included 417 partially disordered proteins. It predicted the frequency of disordered proteins to be 1.95% for the proteins with 5%–10% disordered sequences, 1.46% for the proteins with 10%–20% disordered sequences and 16.57% for proteins with 20%–40% disordered sequences. CONCLUSION: The proposed method, which utilizes the information of structure-unknown data, predicts disordered proteins more accurately than other methods and is less affected by training data sparseness
    corecore