546 research outputs found

    Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules

    Get PDF
    © The Author(s) 2018 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Abstract Objectives The arrival of free oxygen on the globe, aerobic life is becoming possible. However, it has become very clear that the oxygen binding proteins are widespread in the biosphere and are found in all groups of organisms, including prokaryotes, eukaryotes as well as in fungi, plants, and animals. The exponential growth and availability of fresh annotated protein sequences in the databases motivated us to develop an improved version of “Oxypred” for identifying oxygen-binding proteins. Results In this study, we have proposed a method for identifying oxy-proteins with two different sequence similarity cutoffs 50 and 90%. A different amino acid composition based Support Vector Machines models was developed, including the evolutionary profiles in the form position-specific scoring matrix (PSSM). The fivefold cross-validation techniques were applied to evaluate the prediction performance. Also, we compared with existing methods, which shows nearly 97% recognition, but, our newly developed models were able to recognize almost 99.99 and 100% in both oxy-50 and 90% similarity models respectively. Our result shows that our approaches are faster and achieve a better prediction performance over the existing methods. The web-server Oxypred2 was developed for an alternative method for identifying oxy-proteins with more additional modules including PSSM, available at http://bioinfo.imtech.res.in/servers/muthu/oxypred2/home.html

    Mutations in the protein kinase superfamily

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 25-11-201

    Nuevas aproximaciones computacionales para el estudio y la predicción funcional de dominios de proteínas

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 23-09-2013Obtaining experimental information on the structure, function and important residues for the proteins of a given organism is very time-consuming and expensive. For that reason, developing computational techniques for assigning functional features to protein sequences is an active area of research. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is due to the fact that in the databases of functional annotations these methods use, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. Moreover, in many cases the domains of a protein chain have distinct molecular functions, independent from each other. For this reason, resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required. The main proposal of this thesis was to generate such two resources. We generated the rst large-scale functional annotation at the domain level by annotating the SCOP structural domains with gene ontology terms. Additionally, we performed a large-scale comparison of these annotations with the ones implicit in the functional annotations of InterPro signatures, showing that the performance of this method is globally better. Based on this database of functional annotations at the domain level, we developed a methodology for predicting the molecular function of individual domains and showed that this approach outperforms a standard method based on sequence searches in assigning functions. Additionally, we implemented this methodology on a web server for the concomitant prediction of fold, molecular function and functional sites at the domain level. Although it is clear that the amino acid types are by far the main determinants of the functional features of proteins, several studies suggested that translational speed may also be playing a role in some cases.However, a large scale comparative study on its relationship with a comprehensive diverse set of annotated functional features was missing. For that reason, we performed the rst large scale analysis of the relationship between three experimental proxies of mRNA translation speed and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are related to these mRNA properties. This results support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local e ects on the translation speed, have some consequence on the nal polypeptide. Although the patterns found so far are in general very subtle, for particular cases with very clear patterns these could be used for predicting protein functional sites using single gene sequences. These results might have also implications for the heterologous expression of proteins

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Scalable quantitative interaction proteomics of regulatory DNA elements

    Get PDF

    Fragment Based Protein Active Site Analysis Using Markov Random Field Combinations of Stereochemical Feature-Based Classifications

    Get PDF
    Recent improvements in structural genomics efforts have greatly increased the number of hypothetical proteins in the Protein Data Bank. Several computational methodologies have been developed to determine the function of these proteins but none of these methods have been able to account successfully for the diversity in the sequence and structural conformations observed in proteins that have the same function. An additional complication is the flexibility in both the protein active site and the ligand. In this dissertation, novel approaches to deal with both the ligand flexibility and the diversity in stereochemistry have been proposed. The active site analysis problem is formalized as a classification problem in which, for a given test protein, the goal is to predict the class of ligand most likely to bind the active site based on its stereochemical nature and thereby define its function. Traditional methods that have adapted a similar methodology have struggled to account for the flexibility observed in large ligands. Therefore, I propose a novel fragment-based approach to dealing with larger ligands. The advantage of the fragment-based methodology is that considering the protein-ligand interactions in a piecewise manner does not affect the active site patterns, and it also provides for a way to account for the problems associated with flexible ligands. I also propose two feature-based methodologies to account for the diversity observed in sequences and structural conformations among proteins with the same function. The feature-based methodologies provide detailed descriptions of the active site stereochemistry and are capable of identifying stereochemical patterns within the active site despite the diversity. Finally, I propose a Markov Random Field approach to combine the individual ligand fragment classifications (based on the stereochemical descriptors) into a single multi-fragment ligand class. This probabilistic framework combines the information provided by stereochemical features with the information regarding geometric constraints between ligand fragments to make a final ligand class prediction. The feature-based fragment identification methodology had an accuracy of 84% across a diverse set of ligand fragments and the mrf analysis was able to succesfully combine the various ligand fragments (identified by feature-based analysis) into one final ligand based on statistical models of ligand fragment distances. This novel approach to protein active site analysis was additionally tested on 3 proteins with very low sequence and structural similarity to other proteins in the PDB (a challenge for traditional methods) and in each of these cases, this approach successfully identified the cognate ligand. This approach addresses the two main issues that affect the accuracy of current automated methodologies in protein function assignment
    corecore