2,740 research outputs found

    StaPep: an open-source tool for the structure prediction and feature extraction of hydrocarbon-stapled peptides

    Full text link
    Many tools exist for extracting structural and physiochemical descriptors from linear peptides to predict their properties, but similar tools for hydrocarbon-stapled peptides are lacking.Here, we present StaPep, a Python-based toolkit designed for generating 2D/3D structures and calculating 21 distinct features for hydrocarbon-stapled peptides.The current version supports hydrocarbon-stapled peptides containing 2 non-standard amino acids (norleucine and 2-aminoisobutyric acid) and 6 nonnatural anchoring residues (S3, S5, S8, R3, R5 and R8).Then we established a hand-curated dataset of 201 hydrocarbon-stapled peptides and 384 linear peptides with sequence information and experimental membrane permeability, to showcase StaPep's application in artificial intelligence projects.A machine learning-based predictor utilizing above calculated features was developed with AUC of 0.85, for identifying cell-penetrating hydrocarbon-stapled peptides.StaPep's pipeline spans data retrieval, cleaning, structure generation, molecular feature calculation, and machine learning model construction for hydrocarbon-stapled peptides.The source codes and dataset are freely available on Github: https://github.com/dahuilangda/stapep_package.Comment: 26 pages, 6 figure

    SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids.

    Get PDF
    Tumor-homing peptides (THPs) are small peptides that can recognize and bind cancer cells specifically. To gain a better understanding of THPs' functional mechanisms, the accurate identification and characterization of THPs is required. Although some computational methods for in silico THP identification have been proposed, a major drawback is their lack of model interpretability. In this study, we propose a new, simple and easily interpretable computational approach (called SCMTHP) for identifying and analyzing tumor-homing activities of peptides via the use of a scoring card method (SCM). To improve the predictability and interpretability of our predictor, we generated propensity scores of 20 amino acids as THPs. Finally, informative physicochemical properties were used for providing insights on characteristics giving rise to the bioactivity of THPs via the use of SCMTHP-derived propensity scores. Benchmarking experiments from independent test indicated that SCMTHP could achieve comparable performance to state-of-the-art method with accuracies of 0.827 and 0.798, respectively, when evaluated on two benchmark datasets consisting of Main and Small datasets. Furthermore, SCMTHP was found to outperform several well-known machine learning-based classifiers (e.g., decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes and partial least squares regression) as indicated by both 10-fold cross-validation and independent tests. Finally, the SCMTHP web server was established and made freely available online. SCMTHP is expected to be a useful tool for rapid and accurate identification of THPs and for providing better understanding on THP biophysical and biochemical properties

    Probing protein sequences as sources for encrypted antimicrobial peptides

    Get PDF
    Starting from the premise that a wealth of potentially biologically active peptides may lurk within proteins, we describe here a methodology to identify putative antimicrobial peptides encrypted in protein sequences. Candidate peptides were identified using a new screening procedure based on physicochemical criteria to reveal matching peptides within protein databases. Fifteen such peptides, along with a range of natural antimicrobial peptides, were examined using DSC and CD to characterize their interaction with phospholipid membranes. Principal component analysis of DSC data shows that the investigated peptides group according to their effects on the main phase transition of phospholipid vesicles, and that these effects correlate both to antimicrobial activity and to the changes in peptide secondary structure. Consequently, we have been able to identify novel antimicrobial peptides from larger proteins not hitherto associated with such activity, mimicking endogenous and/or exogenous microorganism enzymatic processing of parent proteins to smaller bioactive molecules. A biotechnological application for this methodology is explored. Soybean (Glycine max) plants, transformed to include a putative antimicrobial protein fragment encoded in its own genome were tested for tolerance against Phakopsora pachyrhizi, the causative agent of the Asian soybean rust. This procedure may represent an inventive alternative to the transgenic technology, since the genetic material to be used belongs to the host organism and not to exogenous sources

    4種類の免疫ペプチド分類問題を解決する機械学習アプローチ

    Get PDF
    Peptides play an important role in all aspects of the immunological reactions to invading cancer and pathogen cells. It has been known for over 40-years that peptides are critical influences in assembling the immune system against foreign invaders. Since then, new knowledge about the generation and function of peptides in immunology has supported efforts to harness the immune system to treat disease. Yet, with little immunological insight, most of the highly productive treatments, including vaccines, have been developed empirically. Nonetheless, increased knowledge of the biology of antigen processing as well as chemistry and pharmacological properties of antigenic and antimicrobial peptides has now permitted to development of drugs and vaccines. Due to advanced technologies, it is vitally important to develop automatic computational methods for rapidly and accurately predicting immune-peptides. In this thesis, the author focuses on the machine learning approaches for addressing classification problems of four types of immune-peptides (anti-inflammatory, proinflammatory, anti-tuberculosis, and linear B-cell peptides).Numerous inflammatory diseases and autoimmune disorders by therapeutic peptides have received substantial consideration; however, the exploration of anti-inflammatory peptides via biological experiments is often a time consuming and expensive task. The development of novel in silico predictors is desired to classify potential anti-inflammatory peptides prior to in vitro investigation. Herein, an accurate predictor, called PreAIP (Predictor of Anti-Inflammatory Peptides) was developed by integrating multiple complementary features. We systematically investigated different types of features including primary sequence, evolutionary and structural information through a random forest classifier. The final PreAIP model achieved an AUC value of 0.833 in the training dataset via 10-fold cross-validation test, which was better than that of existing models. Moreover, we assessed the performance of the PreAIP with an AUC value of 0.840 on a test dataset to demonstrate that the proposed method outperformed the two existing methods. These results indicated that the PreAIP is an accurate predictor for identifying anti-inflammatory peptides and contributes to the development of anti-inflammatory peptides therapeutics and biomedical research. The curated datasets and the PreAIP are freely available at http://kurata14.bio.kyutech.ac.jp/PreAIP/. A proinflammatory peptide (PIP) is a type of signaling molecules that are secreted from immune cells, which contributes to the first line of defense against invading pathogens. Numerous experiments have shown that PIPs play an important role in human physiology such as vaccines and immunotherapeutic drugs. Considering high-throughput laboratory methods that are time consuming and costly, effective computational methods are great demand to timely and accurately identify PIPs. Thus, in this study, we proposed a computational model in conjunction with a multiple feature representation, called ProIn-Fuse, to improve the performance of PIPs identification. Specifically, a feature representation learning model was utilized to generate a set of informative probabilistic features by making the use of random forest models with eight sequence encoding schemes. Finally, the ProIn-Fuse was constructed by the linearly combined models of the informative probabilistic features. The generalization capability of our proposed method evaluated through independent test showed that ProIn-Fuse yielded an accuracy of 0.746, which was over 10% higher than those obtained by the state-of-the-art PIP predictors. Cross-validation and independent results consistently demonstrated that ProIn-Fuse is more precise and promising in the identification of PIPs than existing PIP predictors. The web server, datasets and online instruction are freely accessible at http://kurata14.bio.kyutech.ac.jp/ProIn-Fuse/. We believe that the proposed ProIn-Fuse can facilitate faster and broader applications of PIPs in drug design and development. Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently anti-TB peptides have provided an alternative approach to combat antibiotic tolerance. Herein, we have developed an effective computational predictor iAntiTB (identification of anti-tubercular peptides) that integrates multiple feature vectors deriving from the amino acid sequences via Random Forest (RF) and Support Vector Machine (SVM) classifiers. The iAntiTB combined the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor we prepared the two datasets with different types of negative samples. The iAntiTB achieved AUC values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors. Thus, the iAntiTB is a robust and accurate predictor that is helpful for researchers working on peptide therapeutics and immunotherapy. All the employed datasets and software application are accessible at http://kurata14.bio.kyutech.ac.jp/iAntiTB/. Linear B-cell peptides are critically important for immunological applications such as vaccine design, immunodiagnostic tests, antibody production, and disease diagnosis and therapy. The accurate identification of linear B-cell peptides remains challenging despite several decades of research. In this work, we have developed a novel predictor, iLBE (Identification of B-Cell Epitope), by integrating evolutionary and sequence-based features. The successive feature vectors were optimized by a Wilcoxon rank-sum test. Then the random forest (RF) algorithm used the optimal consecutive feature vectors to predict linear B-cell epitopes. We combined the RF scores by the logistic regression to enhance the prediction accuracy. The performance of the final iLBE yielded an AUC score of 0.809 on the training dataset. It outperformed other existing prediction models on a comprehensive independent dataset. The iLBE is suggested to be a powerful computational tool to identify the linear B-cell peptides and development of penetrating diagnostic tests. A web application with curated datasets is freely accessible of iLBE at http://kurata14.bio.kyutech.ac.jp/iLBE/. Taken together, the above results suggest that our proposed predictors (PreAIP, ProIn-Fuse, iAntiTB, and iLBE) would be helpful computational resources for the prediction of anti-inflammatory, pro-inflammatory, tuberculosis, and linear B-cell peptides. / ペプチドは、癌や病原体細胞に対する免疫反応のあらゆる側面で重要な役割を果たす。ペプチドが外来の侵入物に対する免疫系を起動する上で決定的な影響を与えることは40年以上前から知られている。それ以来、免疫学におけるペプチドの生成と機能に関する新しい知見は、病気を治療するために免疫系を利用する研究を支えてきた。依然として、免疫学的洞察がほとんどないため、ワクチンを含む効率的治療法のほとんどは、経験的に開発されている。それでもなお、抗原プロセシングの生物学、ならびに抗原性および抗菌性ペプチドの化学・薬理学に関する知見の増加により、現在、薬物およびワクチンの開発が可能になっている。高度な技術により、免疫ペプチドを迅速かつ正確に予測するためのコンピュータ技術を開発することが非常に重要である。この論文では、著者は4種類の免疫ペプチド(抗炎症、炎症誘発性、抗結核、および線形B細胞エピトープ)の分類問題に対処するための機械学習アプローチに焦点を当てる。炎症性疾患および自己免疫疾患に対する治療用ペプチドは、多くの検討がなされてきた。しかし、生物学的実験による抗炎症ペプチドの探索は、多くの場合、時間と費用のかかる作業である。新しいin siloco予測器の開発は、in vitro実験に先立って、潜在的な抗炎症ペプチドを同定するために望まれている。ここでは、PreAIP(抗炎症ペプチドの予測器)と呼ばれる予測器が、複数の補完的機能を統合することによって開発された。一次配列、進化的および構造的情報を含むさまざまなタイプの特徴量を、ランダムフォレスト分類器を介して抽出した。最終的なPreAIPモデルは、10分割交差検定によるトレーニングデータセットで0.833のAUC値を達成した。これは、既存のモデルよりも優れた値である。さらに、独立の検証用データセットでAUC値0.840を達成し、提案された方法が2つの既存の予測器よりも優れていることを示した。これらの結果は、PreAIPが抗炎症ペプチドを同定するための正確な予測器であり、抗炎症ペプチド治療および生物医学研究の開発に貢献した。用いたデータセットとPreAIPは、http://kurata14.bio.kyutech.ac.jp/PreAIP/から自由に利用できる。炎症誘発性ペプチド(PIP)は、免疫細胞から分泌されるシグナル伝達分子の一種であり、侵入する病原体に対する防御の第一線を担当する。多くの実験により、PIPはワクチンや免疫療法薬などにおいて重要な役割を果たすことが示されている。ハイスループットな生物実験に時間と費用が掛かることを考えると、効率的なコンピュータ予測は、PIPを短時間にかつ正確に特定するために大きな需要がある。したがって、この研究では、PIP識別性能を向上させるために、ProIn-Fuseと呼ばれる複数の特徴表現を組み合わせた計算モデルを提案した。具体的には、特徴表現学習モデルを利用して、8つのシーケンスエンコーディングスキームを備えたランダムフォレストモデルを利用することにより、確率的予測スコアを計算した。ProIn-Fuseは、確率的予測スコアの線形結合モデルによって構築された。提案手法の汎化性能を独立したテストデータで評価した結果、ProIn-Fuseの精度は0.746であり、これは最新のPIP予測器によって得られた精度よりも10%以上高かった。テストデータによる検証結果は、ProIn-Fuseが既存のPIP予測器よりも正確にPIP識別できることを示した。Webサーバー、データセット、および説明書は、http://kurata14.bio.kyutech.ac.jp/ProIn-Fuse/から自由にアクセスできる。ProIn-Fuseは、ドラッグデザイン含む幅広いアプリケーションに応用できる。結核(TB)は、結核菌によって引き起こされる疾患である。最近、抗結核ペプチドは抗生物質耐性に対抗するための代替アプローチを提供している。ここでは、ランダムフォレスト(RF)およびサポートベクターマシン(SVM)分類器を用いてアミノ酸配列に由来する複数の特徴ベクトルを統合する効果的な予測器iAntiTB(抗結核ペプチドの識別)を開発した。iAntiTBは、線形回帰を介してRFスコアとSVMスコアを組み合わせて、予測精度を向上させた。ロバストで正確な予測器を作成するために、異なるタイプのネガティブサンプルを使用して2つのデータセットを準備した。iAntiTBは、1番目と2番目のデータセットのトレーニングデータセットでそれぞれ0.896と0.946のAUC値を達成した。iAntiTBは、他の既存の予測器の性能を上回った。このように、iAntiTBは、ペプチド治療および免疫療法に取り組んでいる研究者に役立つロバストで正確な予測器である。利用されたすべてのデータセットとソフトウェアアプリケーションは、http://kurata14.bio.kyutech.ac.jp/iAntiTB/から自由にアクセスできる。線形B細胞エピトープは、ワクチンの設計、免疫診断テスト、抗体産生、疾患の診断や治療などの免疫学的応用に非常に重要である。線形B細胞エピトープの正確な同定は、数十年の研究にもかかわらず、依然として挑戦的課題のままである。本研究では、配列の進化的特徴や物理化学的特徴等を統合することにより、新規な線形B細胞エピトープ予測モデル(iLBE)を開発した。Wilcoxon順位和検定によって最適化した特徴ベクトル群をランダムフォレスト(RF)アルゴリズムを用いて学習して、線形B細胞エピトープの予測スコアを計算した。ロジスティック回帰を用いてRFスコアを組合せて、予測精度を高めた。iLBEは、トレーニングデータセットで0.809のAUCを達成し、独立のテストデータセットを用いた検定では、既存の予測モデルの性能を超えた。線形B細胞エピトープを同定する強力な計算ツールであるiLBEは、診断テストの開発に有用である。注釈付きデータセットを備えたiLBEモデルのウエブアプリケーションは自由にアクセスできるhttp://kurata14.bio.kyutech.ac.jp/iLBE/。九州工業大学博士学位論文 学位記番号:情工博甲第358号 学位授与年月日:令和3年3月25日1 Introduction|2 Prediction of Anti-Inflammatory Peptides by Integrating Mulptle Complementary Features|3 Prediction of Proinflammatory Peptides by Fusing of Multiple Feature Representations|4 Prediction of Anti-Tubercular Peptides by Exploiting Amino Acid Pattern and Properties|5 Prediction of Linear B-Cell Epitopes by Integrating Sequence and Evolutionary Features|6 Conclusions and Perspectives九州工業大学令和2年

    Application of the SwissDrugDesign Online Resources in Virtual Screening.

    Get PDF
    SwissDrugDesign is an important initiative led by the Molecular Modeling Group of the SIB Swiss Institute of Bioinformatics. This project provides a collection of freely available online tools for computer-aided drug design. Some of these web-based methods, i.e., SwissSimilarity and SwissTargetPrediction, were especially developed to perform virtual screening, while others such as SwissADME, SwissDock, SwissParam and SwissBioisostere can find applications in related activities. The present review aims at providing a short description of these methods together with examples of their application in virtual screening, where SwissDrugDesign tools successfully supported the discovery of bioactive small molecules

    Discovery of Molecules that Modulate Protein-Protein Interactions in the Context of Human Proliferating Cell Nuclear Antigen-Associated Processes of DNA Replication and Damage Repair

    Get PDF
    Integral to cell viability is the homotrimeric protein complex Proliferating Cell Nuclear Antigen (PCNA) that encircles chromatin-bound DNA and functionally acts as a DNA clamp that provides topological sites for recruitment of proteins necessary for DNA replication and damage repair. PCNA has critical roles in the survival and proliferation of cells, as disease-associated dysregulation of associated functions can have dire effects on genome stability, leading to the formation of various malignancies ranging from non-Hodgkin’s lymphoma to skin, laryngeal, ocular, prostate and breast cancers. Here, a strategy was explored with PCNA as a drug target that may have wider implications for targeting protein-protein interactions (PPIs) as well as for fragment-based drug design. A design platform using peptidomimetic small molecules was developed that maps ideal surface binding interaction sites at a PPI interface before considering detailed conformations of an optimal ligand. A novel in silico multi-fragment, combinatorial screening approach was used to guide the selection and subsequent synthesis of tripeptoid ligands, which were evaluated in a PCNA-based competitive displacement assay. From the results, some of the peptoid-based compounds that were synthesized displayed the ability to disrupt the interaction between PCNA and a PIP box-containing peptide. The IC50 values of these compounds had similar or improved affinity to that of T2AA, an established inhibitor of PCNA-PIP box interactions. The information gained here could be useful for subsequent drug lead candidate identification

    3pHLA-score improves structure-based peptide-HLA binding affinity prediction

    Get PDF
    Binding of peptides to Human Leukocyte Antigen (HLA) receptors is a prerequisite for triggering immune response. Estimating peptide-HLA (pHLA) binding is crucial for peptide vaccine target identification and epitope discovery pipelines. Computational methods for binding affinity prediction can accelerate these pipelines. Currently, most of those computational methods rely exclusively on sequence-based data, which leads to inherent limitations. Recent studies have shown that structure-based data can address some of these limitations. In this work we propose a novel machine learning (ML) structure-based protocol to predict binding affinity of peptides to HLA receptors. For that, we engineer the input features for ML models by decoupling energy contributions at different residue positions in peptides, which leads to our novel per-peptide-position protocol. Using Rosetta’s ref2015 scoring function as a baseline we use this protocol to develop 3pHLA-score. Our per-peptide-position protocol outperforms the standard training protocol and leads to an increase from 0.82 to 0.99 of the area under the precision-recall curve. 3pHLA-score outperforms widely used scoring functions (AutoDock4, Vina, Dope, Vinardo, FoldX, GradDock) in a structural virtual screening task. Overall, this work brings structure-based methods one step closer to epitope discovery pipelines and could help advance the development of cancer and viral vaccines

    Modelling the structure of full-length Epstein-Barr virus nuclear antigen 1

    Get PDF
    Epstein-Barr virus (EBV) is a clinically important human virus associated with several cancers and is the etiologic agent of infectious mononucleosis. The viral nuclear antigen-1 (EBNA1) is central to the replication and propagation of the viral genome and likely contributes to tumourigenesis. We have compared EBNA1 homologues from other primate lymphocryptoviruses (LCV) and found that the central glycine/alanine repeat (GAr) domain, as well as predicted cellular protein (USP7 and CK2) binding sites are present in homologues in the Old World primates, but not the marmoset; suggesting that these motifs may have co-evolved. Using the resolved structure of the C-terminal one third of EBNA1 (homodimerisation and DNA binding domain), we have gone on to develop monomeric and dimeric models in silico of the full length protein. The C-terminal domain is predicted to be structurally highly similar between homologues, indicating conserved function. Zinc could be stably incorporated into the model, bonding with two N-terminal cysteines predicted to facilitate multimerisation. The GAr contains secondary structural elements in the models, while the protein binding regions are unstructured, irrespective of the prediction approach used and sequence origin. These intrinsically disordered regions may facilitate the diversity observed in partner interactions. We hypothsise that the structured GAr could mask the disordered regions, thereby protecting the protein from default degradation. In the dimer conformation, the C-terminal tails of each monomer wrap around a proline-rich protruding loop of the partner monomer, providing dimer stability, a feature which could be exploited in therapeutic design

    Collagen-derived cryptides : machine-learning prediction and molecular dynamic interaction against Klebsiella pneumoniae biofilm synthesis precursor

    Get PDF
    Collagen-derived cryptic peptides (cryptides) are biologically active peptides derived from the proteolytic digestion of collagen protein. These cryptides possess a multitude of activities, including antihypertensive, antiproliferative, and antibacterial. The latter, however, has not been extensively studied. The cryptides are mainly obtained from the protein hydrolysate, followed by characterizations to elucidate the function, limiting the number of cryptides investigated within a short period. The recent threat of antimicrobial resistance microorganisms (AMR) to global health requires the rapid development of new therapeutic drugs. The current study aims to predict antimicrobial peptides (AMP) from collagen-derived cryptides, followed by elucidating their potential to inhibit biofilm-related precursors in Klebsiella pneumoniae using in silico approach. Therefore, cryptides derived from collagen amino acid sequences of various types and species were subjected to online machine-learning platforms (i.e., CAMPr3, DBAASP, dPABBs, Hemopred, and ToxinPred). The peptide-protein interaction was elucidated using molecular docking, molecular dynamics, and MM-PBSA analysis against MrkH, a K. pneumoniae’s transcriptional regulator of type 3 fimbriae that promote biofilm formation. As a result, six potential antibiofilm inhibitory cryptides were screened and docked against MrkH. All six peptides bind stronger than the MrkH ligand (c-di-GMP; C2E)
    corecore