108 research outputs found

    LEAP: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains

    Get PDF
    Prediction of protein loop conformations without any prior knowledge (ab initio prediction) is an unsolved problem. Its solution will significantly impact protein homology and template-based modeling as well as ab initio protein-structure prediction. Here, we developed a coarse-grained, optimized scoring function for initial sampling and ranking of loop decoys. The resulting decoys are then further optimized in backbone and side-chain conformations and ranked by all-atom energy scoring functions. The final integrated technique called loop prediction by energy-assisted protocol achieved a median value of 2.1 Å root mean square deviation (RMSD) for 325 12-residue test loops and 2.0 Å RMSD for 45 12-residue loops from critical assessment of structure-prediction techniques (CASP) 10 target proteins with native core structures (backbone and side chains). If all side-chain conformations in protein cores were predicted in the absence of the target loop, loop-prediction accuracy only reduces slightly (0.2 Å difference in RMSD for 12-residue loops in the CASP target proteins). The accuracy obtained is about 1 Å RMSD or more improvement over other methods we tested. The executable file for a Linux system is freely available for academic users at http://sparks-lab.org

    SVMTriP: A Method to Predict Antigenic Epitopes Using Support Vector Machine to Integrate Tri-Peptide Similarity and Propensity

    Get PDF
    Identifying protein surface regions preferentially recognizable by antibodies (antigenic epitopes) is at the heart of new immuno-diagnostic reagent discovery and vaccine design, and computational methods for antigenic epitope prediction provide crucial means to serve this purpose. Many linear B-cell epitope prediction methods were developed, such as BepiPred, ABCPred, AAP, BCPred, BayesB, BEOracle/BROracle, and BEST, towards this goal. However, effective immunological research demands more robust performance of the prediction method than what the current algorithms could provide. In this work, a new method to predict linear antigenic epitopes is developed; Support Vector Machine has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP). Applied to non-redundant B-cell linear epitopes extracted from IEDB, SVMTriP achieves a sensitivity of 80.1% and a precision of 55.2% with a five-fold cross-validation. The AUC value is 0.702. The combination of similarity and propensity of tri-peptide subsequences can improve the prediction performance for linear B-cell epitopes. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background. A web server based on our method is constructed for public use. The server and all datasets used in the current study are available at http://sysbio.unl.edu/SVMTriP

    Protein binding site prediction using an empirical scoring function

    Get PDF
    Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches

    Conformational B-Cell Epitope Prediction on Antigen Protein Structures: A Review of Current Algorithms and Comparison with Common Binding Site Prediction Methods

    Get PDF
    Accurate prediction of B-cell antigenic epitopes is important for immunologic research and medical applications, but compared with other bioinformatic problems, antigenic epitope prediction is more challenging because of the extreme variability of antigenic epitopes, where the paratope on the antibody binds specifically to a given epitope with high precision. In spite of the continuing efforts in the past decade, the problem remains unsolved and therefore still attracts a lot of attention from bioinformaticists. Recently, several discontinuous epitope prediction servers became available, and it is intriguing to review all existing methods and evaluate their performances on the same benchmark. In addition, these methods are also compared against common binding site prediction algorithms, since they have been frequently used as substitutes in the absence of good epitope prediction methods

    Identification of a nuclear localization motif in the serine/arginine protein kinase PSRPK of physarum polycephalum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Serine/arginine (SR) protein-specific kinases (SRPKs) are conserved in a wide range of organisms, from humans to yeast. Studies showed that SRPKs can regulate the nuclear import of SR proteins in cytoplasm, and regulate the sub-localization of SR proteins in the nucleus. But no nuclear localization signal (NLS) of SRPKs was found. We isolated an SRPK-like protein PSRPK (GenBank accession No. <ext-link ext-link-id="DQ140379" ext-link-type="gen">DQ140379</ext-link>) from <it>Physarum polycephalum </it>previously, and identified a NLS of PSRPK in this study.</p> <p>Results</p> <p>We carried out a thorough molecular dissection of the different domains of the PSRPK protein involved in its nuclear localization. By truncation of PSRPK protein, deletion of and single amino acid substitution in a putative NLS and transfection of mammalian cells, we observed the distribution of PSRPK fluorescent fusion protein in mammalian cells using confocal microscopy and found that the protein was mainly accumulated in the nucleus; this indicated that the motif contained a nuclear localization signal (NLS). Further investigation with truncated PSPRK peptides showed that the NLS (<sup>318</sup>PKKGDKYDKTD<sup>328</sup>) was localized in the alkaline Ω-loop of a helix-loop-helix motif (HLHM) of the C-terminal conserved domain. If the <sup>318</sup>PKKGDK<sup>322 </sup>sequence was deleted from the loop or K<sup>320 </sup>was mutated to T<sup>320</sup>, the PSRPK fluorescent fusion protein could not enter and accumulate in the nucleus.</p> <p>Conclusion</p> <p>This study demonstrated that the <sup>318</sup>PKKGDKYDKTD<sup>328 </sup>peptides localized in the C-terminal conserved domain of PSRPK with the Ω-loop structure could play a crucial role in the NLS function of PSRPK.</p

    Simulating soil salinity dynamics, cotton yield and evapotranspiration under drip irrigation by ensemble machine learning

    Get PDF
    We thank the China Scholarship Council (CSC) for providing a scholarship (202206710073) to Zewei Jiang. This work was supported by the Fundamental Research Funds for the Central Universities (B220203009), the Postgraduate Research & Practice Program of Jiangsu Province (KYCX22_0669), the Water Conservancy Science and Technology Project of Jiangxi Province (201921ZDKT06, 202124ZDKT09), the National Natural Science Foundation of China (51879076), the Fundamental Research Funds for the Central Universities (B210204016), Science & Technology Specific Projects in Agricultural High-tech Industrial Demonstration Area of the Yellow River Delta, Grant No: 2022SZX01.Peer reviewedPublisher PD

    Simulating soil salinity dynamics, cotton yield and evapotranspiration under drip irrigation by ensemble machine learning

    Get PDF
    Cotton is widely used in textile, decoration, and industry, but it is also threatened by soil salinization. Drip irrigation plays an important role in improving water and fertilization utilization efficiency and ensuring crop production in arid areas. Accurate prediction of soil salinity and crop evapotranspiration under drip irrigation is essential to guide water management practices in arid and saline areas. However, traditional hydrological models such as Hydrus require more variety of input parameters and user expertise, which limits its application in practice, and machine learning (ML) provides a potential alternative. Based on a global dataset collected from 134 pieces of literature, we proposed a method to comprehensively simulate soil salinity, evapotranspiration (ET) and cotton yield. Results showed that it was recommended to predict soil salinity, crop evapotranspiration and cotton yield based on soil data (bulk density), meteorological factors, irrigation data and other data. Among them, meteorological factors include annual average temperature, total precipitation, year. Irrigation data include salinity in irrigation water, soil matric potential and irrigation water volume, while other data include soil depth, distance from dripper, days after sowing (for EC and soil salinity), fertilization rate (for yield and ET). The accuracy of the model has reached a satisfactory level, R2 in 0.78-0.99. The performance of stacking ensemble ML was better than that of a single model, i.e., gradient boosting decision tree (GBDT); random forest (RF); extreme gradient boosting regression (XGBR), with R2 increased by 0.02%-19.31%. In all input combinations, other data have a greater impact on the model accuracy, while the RMSE of the S1 scenario (input without meteorological factors) without meteorological data has little difference, which is -34.22%~19.20% higher than that of full input. Given the wide application of drip irrigation in cotton, we recommend the application of ensemble ML to predict soil salinity and crop evapotranspiration, thus serving as the basis for adjusting the irrigation schedule

    Fast and accurate prediction of protein side-chain conformations

    Get PDF
    Summary: We developed a fast and accurate side-chain modeling program [Optimized Side Chain Atomic eneRgy (OSCAR)-star] based on orientation-dependent energy functions and a rigid rotamer model. The average computing time was 18 s per protein for 218 test proteins with higher prediction accuracy (1.1% increase for χ1 and 0.8% increase for χ1+2) than the best performing program developed by other groups. We show that the energy functions, which were calibrated to tolerate the discrete errors of rigid rotamers, are appropriate for protein loop selection, especially for decoys without extensive structural refinement

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
    corecore