29 research outputs found

    Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models

    Get PDF
    BACKGROUND: The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. RESULTS: We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. CONCLUSION: As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential

    Exploring QSARs for Inhibitory Activity of Non-peptide HIV-1 Protease Inhibitors by GA-PLS and GA-SVM

    Get PDF
    The support vector machine (SVM) and partial least square (PLS) methods were used to develop quantitative structure activity relationship (QSAR) models to predict the inhibitory activity of non-peptide HIV-1 protease inhibitors. Genetic algorithm (GA) was employed to select variables that lead to the best-fitted models. A comparison between the obtained results using SVM with those of PLS revealed that the SVM model is much better than that of PLS. The root mean square errors of the training set and the test set for SVM model were calculated to be 0.2027, 0.2751, and the coefficients of determination (R(2)) are 0.9800, 0.9355 respectively. Furthermore, the obtained statistical parameter of leave-one-out cross-validation test (Q(2)) on SVM model was 0.9672, which proves the reliability of this model. The results suggest that TE2, Ui, GATS5e, Mor13e, ATS7m, Ss, Mor27e, and RDF035e are the main independent factors contributing to the inhibitory activities of the studied compounds.The authors would like to acknowledge the computational chemistry laboratory at Al-Quds University for providing Matlab software and for the time dedicated for performing the calculations of the study

    PhageDPO: phage depolymerase finder

    Get PDF
    Dissertação de mestrado em BioinformaticsAntibiotic resistance is a severe public health problem. New resistance mechanisms are rapidly emerging and spreading globally, threatening our ability to treat infections. The bacteriophages (phages) arise as a possible solution through their capability of infecting and killing bacteria. Phages are natural bacterial predators: they encode an arsenal of specialized proteins to target their bacterial hosts. One emerging protein is Phages Depolymerases (DPOs), responsible for selective recognition and degradation of bacterial cell surface decorating polysaccharides, turning the bacteria susceptible to external agents. Due to the difficulty in locating these enzymes in the phage genome, we developed PhageDPO, a DPO prediction tool, through machine learning methods. Several classifiers were created, using different datasets and algorithms and tested through cross-validation. The datasets were composed of protein sequences retrieved from the NCBI protein database and by a different number of negative cases. Two models were selected for integration in the tool: the Support Vector Machine (SVM) model created with a dataset containing data of 4311 sequences and the Artificial Neural Network (ANN) model created with a dataset containing data of 7185 sequences. On an independent validation dataset, the SVM model presented 95% accuracy, 98% precision and 91% recall and the ANN model presented 98% accuracy, 99% precision and 96% recall. While the high precision and PECC of the SVM focus on predicting true DPO sequences and avoiding false positives, the ANN ensures that all DPOs are identified due to its high recall. PhageDPO was successfully tested in predicting DPOs of, previously characterized, phages. PhageDPO was integrated into the Galaxy framework (https://bit.ly/3dOam2u), providing a user-friendly graphical interface for wet-lab researchers without computational skills.A resistência aos antibióticos é um sério problema de saúde pública. Novos mecanismos de resistência estão a aparecer e a espalhar-se por todo o mundo, ameaçando a nossa capacidade de tratar infeções. Os bacteriófagos (fagos) surgem como uma solução pela sua capacidade de infeção e lise de bactérias. Os fagos são predadores naturais de bactérias: codificam um arsenal de proteínas especializadas para infeção dos seus hospedeiros. Uma proteína emergente é a depolimerase de polissacarídeos (DPOs) dos fagos, responsável pelo reconhecimento seletivo e degradação dos polissacarídeos presentes na superfície das bactérias, tornando-a suscetível a agentes externos. Devido à sua difícil localização no genoma do fago, foi desenvolvida a ferramenta PhageDPO, para previsão de DPOs, através de métodos de aprendizagem máquina. Vários modelos foram desenvolvidos, com diferentes conjuntos de dados, e testados através de validação cruzada. Os conjuntos de dados são constituídos por sequências protéicas retiradas da base de dados NCBI protein e por números diferentes de casos negativos. Dois modelos foram incorporados na ferramenta: o modelo SVM treinado com dados de 4311 sequências e o modelo ANN treinado com dados de 7185 sequências. Num conjunto independente de dados de validação, o modelo SVM apresentou 95% de exatidão, 98% de precisão e 91 % de sensibilidade e o modelo ANN apresentou 98% de exatidão, 99% de precisão e 96% de sensibilidade. Enquanto que a elevada exatidão e precisão do modelo SVM se foca na previsão de sequências corretamente classificadas, o modelo ANN assegura que todas as DPOs são identificadas devido a sua elevada sensibilidade. A PhageDPO foi testada com sucesso na previsão de DPOs de fagos previamente caracterizados. PhageDPO foi integrado no Galaxy (https://bit.ly/3dOam2u), uma framework com interface gráfica para investigadores sem conhecimento de programaçãoEste estudo contou com o apoio da Fundação para a Ciência e Tecnologia (FCT) portuguesa no âmbito do projeto PhageSTEC PTDC/CVT-CVT/29628/2017 [POCI-01-0145-FEDER-029628

    HLA class I supertype and supermotif definition by chemometric approaches.

    Get PDF
    Activation of cytotoxic T cells in human requires specific binding of antigenic peptides to human leukocyte antigen (HLA) molecules. HLA is the most polymorphic protein in the human body, currently 1814 different alleles collected in the HLA sequence database at the European Bioinformatics Institute. Most of the HLA molecules recognise different peptides. Also, some peptides can be recognised by several of HLA molecules. In the present project, all available class I HLA alleles are classified into supertypes. Super - binding motifs for peptides binding to some supertypes are defined where binding data are available. A variety of chemometric techniques are used in the project, including 2D and 3D QSAR techniques and different variable selection methods like SIMCA, GOLPE and genetic algorithm. Principal component analysis combined with molecular interaction fields calculation by the program GRID is used in the class I HLA classification. This thesis defines an HLA-A3 supermotif using two QSAR methods: the 3D-QSAR method CoMSIA, and a recently developed 2D-QSAR method, which is named the additive method. Four alleles with high phenotype frequency were included in the study: HLA-A*0301, HLA-A*1101, HLA-A*3101 and HLA- A*6801. An A*020T binding motif is also defined using amino acid descriptors and variable selection methods. Novel peptides have been designed according to the motifs and the binding affinity is tested experimentally. The results of the additive method are used in the online server, MHCPred, to predict binding affinity of unknown peptides. In HLA classification, the HLA-A, B and C molecules are classified into supertypes separately. A total of eight supertypes are observed for class I HLA, including A2, A3, A24, B7, B27, B44, CI and C4 supertype. Using the HLA classification, any newly discovered class I HLA molecule can be grouped into a supertype easily, thus simplifying the experimental function characterisation process

    Computational Modeling of the Vacuolar pH-Homeostasis in Arabidopsis thaliana

    Get PDF
    The aim of this work is the analysis of the vacuolar pH homeostasis in Arabidopsis thaliana root cells by means of computational modeling. The pH is an important parameter for a range of cellular processes such as the control of enzyme activity and the maintenance of osmotic pressure acting through the establishment of a proton motive force across the vacuolar membrane that in turn is used in the homeostasis of other ions on both sides of the membrane. Although many processes are known to be important for the establishment and maintenance of an acidic vacuolar lumen, recent experimental results have shown that our current understanding of those processes is not complete. To study the vacuolar pH homeostasis in an integrative manner, this work focuses on three different aspects. In the first part, an overview over computational systems biology approaches in Arabidopsis thaliana is given to demonstrate the state of the art and put the rest of the work in a broader context. The second part then focuses on transmembrane transport reactions and the importance of the correct scaling of the kinetic rate laws of those reactions in mathematical models employing sets of ordinary differential equations, which is of importance for any multi-compartment model such as the one presented in part three of this thesis. In the third part, a mathematical modeling approach is subsequently used to explain experimental data concerning the vacuolar pH homeostasis. To do so, three hypotheses of the mechanisms contributing to vacuolar acidification are developed: An as of yet unknown direct proton import, protons released by protein degradation and the reversal of a proton-calcium antiporter. Each of those hypotheses is implemented in an ordinary differential equations model and tested for feasibility against the experimental data

    An investigation into the crystallisation behaviour of glycine homopeptides

    Get PDF
    The combinations of amino acids into peptides and proteins, through peptide bonds, are the building blocks of life on earth. Their natural therapeutic properties has seen a significant increase in the application of these materials in the treatment of chronic diseases. The purest and most stable crystalline form provides structural information at the atomic level and is desirable for formulation into efficacious pharmaceutical products. Peptide crystallisation, as a good alternative to chromatographic purification, can also solve the shortcomings of traditional purification method, such as high cost, proteolytic degradation and physiochemical instability. However, peptide crystallisation still remains a major challenge due to highly flexible conformations especially in the case where water plays an integral role in the crystal structure. Glycine is the simplest amino acid and is known to play an important role in new biomimetic functional materials and biopharmaceutical research. Its hydrogen side chain makes the molecule an ideal candidate to study the effeect of chain length on the peptide solubility and crystallisation, without the effect of side chain. The glycine homopeptides crystallisation research in this thesis includes three parts: thermodynamic properties, kinetic properties, and the relationship between peptides conformation and crystallisation. Firstly, the solubility of glycine homopeptides (glycine, diglycine, triglycine, tetraglycine, pentaglycine, and hexaglycine), amino acids with different side chains (aspartic acid, phenylalanine, histidine, and tyrosine) and their dipeptides (asp-phe, gly-asp, gly-phe, phe-phe, gly-gly, tyr-phe, gly-tyr, gly-his) in water from 278.15K to 313.15K were measured using the UV-Vis spectroscopy method and dynamic method. The modified Apelblat equation is used to correlate the relationship between solubility in water and temperature. Molecular dynamic (MD) simulation was further employed to investigate the solute-solvent interactions behind the dissolution behaviors. Moreover, the group-group interaction matrix of the SAFT-γ Mie approach was extended for the prediction of the solubility of amino acids and peptides, exploring the application of SAFT-γ Mie to biomolecular thermodynamic properties. Secondly, the classical nucleation theory was applied to the short-chain glycine homopeptide crystallisation to explore the nucleation theory of macromolecules. The nucleation parameters (nucleation rate, growth rate, interfacial surface energy, and activation Gibbs energy) were calculated based on the classical nucleation theory to explore the chain length effect on the classical nucleation mechanism of peptides, providing kinetic data to the crystallisation conditions designed for industry and modeling tools, such as gPROMS. The evidence of the non-classical nucleation phenomenon was also observed and discussed. Finally, the interaction between water and peptide molecules which can stabilize the unfolded structure of peptides and proteins was revealed, the effect of temperature and salts on the transition between unfolded and folded structure was explored, giving an inspiration to the relationship between conformation and peptide crystallisation. The research presented in this thesis investigates the thermodynamic and kinetic properties of glycine homopeptides, as well as the flexible conformation of peptides during crystallisation, thereby providing a comprehensive strategy for designing and optimising the crystallisation process. Additionally, the research establishes a fundamental understanding of peptide crystallisation, which is extremely beneficial for future macromolecular crystallisation research.Open Acces

    Mechanisms Regulating HIV-1 Protease Activity

    Get PDF
    The Human Immunodeficiency Virus Type 1 (HIV-1) Protease (PR) has no direct involvement in the early steps of HIV-1 replication. Nonetheless, it is the timely and ordered processing of the viral structural proteins by the HIV-1 PR during virion maturation that facilitates the successful completion of virus entry, reverse transcription, and integration. Though a considerable amount of research has been devoted to deciphering how the enzyme prepares a virus particle for infection, the mechanisms regulating its activities continue to remain incompletely defined. RNA serves as one putative regulatory factor, since efficient processing of the maturation intermediate p15NC requires RNA in vitro. Though previously believed relevant to only p15NC cleavage, I demonstrate that RNA enhances HIV-1 proteolysis reactions in a substrate-independent manner. The increased catalytic activity of the HIV-1 PR results from a direct interaction between RNA and the enzyme, with the magnitude of the effect dependent upon the size of the RNA molecule. Large (>400 base) RNAs accelerated proteolytic processing by over 100-fold under near-physiological conditions. This considerable change stemmed from both improved substrate recognition (Km) and turnover rate (kcat). Variability in amino acid sequence also guides HIV-1 PR activity. However, the absence of any overt patterns across HIV-1 cleavage sites has complicated the delineation of why these differences result in diverse processing efficiencies. To address this question, I generated the largest-to-date dataset of globular proteins cleaved by the HIV-1 PR in near-physiological conditions. From these data, I unravel a number of site-specific processing requirements, and identify potentially important relationships shared between multiple cleavage sites. These results additionally enabled the formation of a preliminary conceptual model for explaining processing site amino acid composition.Doctor of Philosoph

    Advances in neuroproteomics for neurotrauma: unraveling insights for personalized medicine and future prospects

    Get PDF
    Neuroproteomics, an emerging field at the intersection of neuroscience and proteomics, has garnered significant attention in the context of neurotrauma research. Neuroproteomics involves the quantitative and qualitative analysis of nervous system components, essential for understanding the dynamic events involved in the vast areas of neuroscience, including, but not limited to, neuropsychiatric disorders, neurodegenerative disorders, mental illness, traumatic brain injury, chronic traumatic encephalopathy, and other neurodegenerative diseases. With advancements in mass spectrometry coupled with bioinformatics and systems biology, neuroproteomics has led to the development of innovative techniques such as microproteomics, single-cell proteomics, and imaging mass spectrometry, which have significantly impacted neuronal biomarker research. By analyzing the complex protein interactions and alterations that occur in the injured brain, neuroproteomics provides valuable insights into the pathophysiological mechanisms underlying neurotrauma. This review explores how such insights can be harnessed to advance personalized medicine (PM) approaches, tailoring treatments based on individual patient profiles. Additionally, we highlight the potential future prospects of neuroproteomics, such as identifying novel biomarkers and developing targeted therapies by employing artificial intelligence (AI) and machine learning (ML). By shedding light on neurotrauma’s current state and future directions, this review aims to stimulate further research and collaboration in this promising and transformative field

    Graph kernel extensions and experiments with application to molecule classification, lead hopping and multiple targets

    No full text
    The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and efficient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments
    corecore