1,887 research outputs found

    Development of a suite of bioinformatics tools for the analysis and prection of membrane protein structure.

    Get PDF
    This thesis describes the development of a novel approach for prediction of the three-dimensional structure of transmembrane regions of membrane proteins directly from amino acid sequence and basic transmembrane region topology.The development rationale employed involved a knowledge-based approach. Based on determined membrane protein structures, 20x20 association matrices were generated to summarise the distance associations between amino acid side chains on different alpha helical transmembrane regions of membrane proteins. Using these association matrices, combined with a knowledge-based scale for propensity for residue orientation in transmembrane segments (kPROT) (Pilpel et al., 1999), the software predicts the optimal orientations and associations of transmembrane regions and generates a 3D structural model of a given membrane protein, based on the amino acid sequence composition of its transmembrane regions. During the development, several structural and biostatistical analyses of determined membrane protein structures were undertaken with the aim of ensuring a consistent and reliable association matrix upon which to base the predictions. Evaluation of the model structures obtained for the protein sequences of a dataset of 17 membrane proteins of determined structure based on cross-validated leave-one-out testing revealed general1y high accuracy of prediction, with over 80% of associations between transmembrane regions being correctly predicted. These results provide a promising basis for future development and refinement of the algorithm, and to this end, work is underway using evolutionary computing approaches. As it stands, the approach gives scope for significant immediate benefit to researchers as a valuable starting point in the prediction of structure for membrane proteins of hitherto unknown structure.Tese (Doutorado em Filosofia) - University of Bedfordshire

    Development of a suite of bioinformatics tools for the analysis and prediction of membrane protein structure

    Get PDF
    This thesis describes the development of a novel approach for prediction of the three-dimensional structure of transmembrane regions of membrane proteins directly from amino acid sequence and basic transmembrane region topology. The development rationale employed involved a knowledge-based approach. Based on determined membrane protein structures, 20x20 association matrices were generated to summarise the distance associations between amino acid side chains on different alpha helical transmembrane regions of membrane proteins. Using these association matrices, combined with a knowledge-based scale for propensity for residue orientation in transmembrane segments (kPROT) (Pilpel et al., 1999), the software predicts the optimal orientations and associations of transmembrane regions and generates a 3D structural model of a gi ven membrane protein, based on the amino acid sequence composition of its transmembrane regions. During the development, several structural and biostatistical analyses of determined membrane protein structures were undertaken with the aim of ensuring a consistent and reliable association matrix upon which to base the predictions. Evaluation of the model structures obtained for the protein sequences of a dataset of 17 membrane proteins of detennined structure based on cross-validated leave-one-out testing revealed generally high accuracy of prediction, with over 80% of associations between transmembrane regions being correctly predicted. These results provide a promising basis for future development and refinement of the algorithm, and to this end, work is underway using evolutionary computing approaches. As it stands, the approach gives scope for significant immediate benefit to researchers as a valuable starting point in the prediction of structure for membrane proteins of hitherto unknown structure

    Improving protein succinylation sites prediction using embeddings from protein language model

    Get PDF
    Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    Development of language modelling techniques for protein sequence analysis

    Get PDF
    Dissertação de mestrado em BioinformaticsNowadays, the ability to predict protein functions directly from amino-acid sequences alone remains a major biological challenge. The understanding of protein properties and functions is extremely important and can have a wide range of biotechnological and medical applications. Technological advances have led to an exponential growth of biological data challenging conventional analysis strategies. High-level representations from the field of deep learning can provide new alternatives to address these problems, particularly NLP methods, such as word embeddings, have shown particular success when applied for protein sequence analysis. Here, a module that eases the implementation of word embedding models toward protein representation and classification is presented. Furthermore, this module was integrated in the ProPythia framework, allowing to straightforwardly integrate WE representations with the training and testing of ML and DL models. This module was validated using two protein classification problems namely, identification of plant ubiquitylation sites and lysine crotonylation site prediction. This module was further used to explore enzyme functional annotation. Several WE were tested and fed to different ML and DL networks. Overall, WE achieved good results being even competitive with state-of-the-art models, reinforcing the idea that language based methods can be applied with success to a wide range of protein classification problems. This work presents a freely available tool to perform word embedding techniques for protein classification. The case studies presented reinforce the usability and importance of using NLP and ML in protein classification problems.Hoje em dia, a habilidade de prever a função de proteínas a partir apenas da sequências de amino-ácidos permanece um dos grandes desafios biológicos. A compreensão das propriedades e das funções das proteinas é de extrema importância e pode ter uma grande variedade de aplicações médicas e biotecnológicas. Os avanços nas tecnologia levaram a um crescimento exponencial de dados biológicos, desafiando as estratégias convencionais de análise. O campo do Deep Learning pode providenciar novas alternativas para atender à resolução destes problemas, em particular, os métodos de processamento de linguagem, como por exemplo word embeddings, mostraram especial sucesso quando aplicados para análise de sequências proteicas. Aqui, é apresentado um módulo que facilita a implementação de modelos de “word embedding” para representação e classificação de proteínas. Além disso, este módulo foi integrado na framework ProPythia, permitindo integrar diretamente as representações WE com o treino e teste de modelos ML e DL. Este módulo foi validado usando dois problemas de classificação de proteínas, identificação de locais de ubiquitilação de plantas e previsão de locais de crotonilação de lisinas. Este módulo foi usado também para explorar a anotação funcional de enzimas. Vários WE foram testados e utilizados em diferentes redes ML e DL. No geral, as técnicas de WE obtiveram bons resultados sendo competitivas, mesmo com modelos descritos no estado da arte, reforçando a ideia de que métodos baseados em linguagem podem ser aplicados com sucesso a uma ampla gama de problemas de classificação de proteínas. Este trabalho apresenta uma ferramenta para realizar técnicas de word embedding para classificação de proteínas. Os caso de estudo apresentados reforçam a usabilidade e importância do uso de NLP e ML em problemas de classificação de proteínas

    PBEQ-Solver for online visualization of electrostatic potential of biomolecules

    Get PDF
    PBEQ-Solver provides a web-based graphical user interface to read biomolecular structures, solve the Poisson-Boltzmann (PB) equations and interactively visualize the electrostatic potential. PBEQ-Solver calculates (i) electrostatic potential and solvation free energy, (ii) protein–protein (DNA or RNA) electrostatic interaction energy and (iii) pKa of a selected titratable residue. All the calculations can be performed in both aqueous solvent and membrane environments (with a cylindrical pore in the case of membrane). PBEQ-Solver uses the PBEQ module in the biomolecular simulation program CHARMM to solve the finite-difference PB equation of molecules specified by users. Users can interactively inspect the calculated electrostatic potential on the solvent-accessible surface as well as iso-electrostatic potential contours using a novel online visualization tool based on MarvinSpace molecular visualization software, a Java applet integrated within CHARMM-GUI (http://www.charmm-gui.org). To reduce the computational time on the server, and to increase the efficiency in visualization, all the PB calculations are performed with coarse grid spacing (1.5 Å before and 1 Å after focusing). PBEQ-Solver suggests various physical parameters for PB calculations and users can modify them if necessary. PBEQ-Solver is available at http://www.charmm-gui.org/input/pbeqsolver

    Native structure-based modeling and simulation of biomolecular systems per mouse click

    Get PDF
    Background Molecular dynamics (MD) simulations provide valuable insight into biomolecular systems at the atomic level. Notwithstanding the ever-increasing power of high performance computers current MD simulations face several challenges: the fastest atomic movements require time steps of a few femtoseconds which are small compared to biomolecular relevant timescales of milliseconds or even seconds for large conformational motions. At the same time, scalability to a large number of cores is limited mostly due to long-range interactions. An appealing alternative to atomic-level simulations is coarse-graining the resolution of the system or reducing the complexity of the Hamiltonian to improve sampling while decreasing computational costs. Native structure-based models, also called Gō-type models, are based on energy landscape theory and the principle of minimal frustration. They have been tremendously successful in explaining fundamental questions of, e.g., protein folding, RNA folding or protein function. At the same time, they are computationally sufficiently inexpensive to run complex simulations on smaller computing systems or even commodity hardware. Still, their setup and evaluation is quite complex even though sophisticated software packages support their realization. Results Here, we establish an efficient infrastructure for native structure-based models to support the community and enable high-throughput simulations on remote computing resources via GridBeans and UNICORE middleware. This infrastructure organizes the setup of such simulations resulting in increased comparability of simulation results. At the same time, complete workflows for advanced simulation protocols can be established and managed on remote resources by a graphical interface which increases reusability of protocols and additionally lowers the entry barrier into such simulations for, e.g., experimental scientists who want to compare their results against simulations. We demonstrate the power of this approach by illustrating it for protein folding simulations for a range of proteins. Conclusions We present software enhancing the entire workflow for native structure-based simulations including exception-handling and evaluations. Extending the capability and improving the accessibility of existing simulation packages the software goes beyond the state of the art in the domain of biomolecular simulations. Thus we expect that it will stimulate more individuals from the community to employ more confidently modeling in their research

    pyDock scoring for the new modeling challenges in docking: protein-peptide, homo-multimers and domain-domain interactions

    Get PDF
    The 6th CAPRI edition included new modelling challenges, such as the prediction of protein-peptide complexes, and the modelling of homo-oligomers and domain-domain interactions as part of the first joint CASP-CAPRI experiment. Other non-standard targets included the prediction of interfacial water positions and the modelling of the interactions between proteins and nucleic acids. We have participated in all proposed targets of this CAPRI edition both as predictors and as scorers, with new protocols to efficiently use our docking and scoring scheme pyDock in a large variety of scenarios. In addition, we have participated for the first time in the server section, with our recently developed webserver, pyDockWeb. Excluding the CASP-CAPRI cases, we submitted acceptable models (or better) for 7 out of the 18 evaluated targets as predictors, 4 out of the 11 targets as scorers, and 6 out of the 18 targets as servers. The overall success rates were below those in past CAPRI editions. This shows the challenging nature of this last edition, with many difficult targets for which no participant submitted a single acceptable model. Interestingly, we submitted acceptable models for 83% of the evaluated protein-peptide targets. As for the 25 cases of the CASP-CAPRI experiment, in which we used a larger variety of modelling techniques (template-based, symmetry restraints, literature information, etc.), we submitted acceptable models for 56% of the targets. In summary, this CAPRI edition showed that pyDock scheme can be efficiently adapted to the increasing variety of problems that the protein interactions field is currently facing. This article is protected by copyright. All rights reserved.Peer ReviewedPostprint (author's final draft
    corecore