15 research outputs found

    DockAnalyse : an application for the analysis of protein-protein interactions

    Get PDF
    Background: Is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. Results: To extract those representative solutions from the docking output datafile, we have developed an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. Conclusions: DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible

    Can bioinformatics help in the identification of moonlighting proteins?

    Get PDF
    Protein multitasking or moonlighting is the capability of certain proteins to execute two or more unique biological functions. This ability to perform moonlighting functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Usually, moonlighting proteins are revealed experimentally by serendipity, and the proteins described probably represent just the tip of the iceberg. It would be helpful if bioinformatics could predict protein multifunctionality, especially because of the large amounts of sequences coming from genome projects. In the present article, we describe several approaches that use sequences, structures, interactomics and current bioinformatics algorithms and programs to try to overcome this problem. The sequence analysis has been performed: (i) by remote homology searches using PSI-BLAST, (ii) by the detection of functionalmotifs, and (iii) by the co-evolutionary relationship between amino acids. Programs designed to identify functional motifs/domains are basically oriented to detect the main function, but usually fail in the detection of secondary ones. Remote homology searches such as PSI-BLAST seem to be more versatile in this task, and it is a good complement for the information obtained from protein-protein interaction (PPI) databases. Structural information and mutation correlation analysis can help us to map the functional sites. Mutation correlation analysis can be used only in very restricted situations, but can suggest how the evolutionary process of the acquisition of the second function took plac

    Multifunctional Proteins : Involvement in Human Diseases and Targets of Current Drugs

    Get PDF
    Multifunctionality or multitasking is the capability of some proteins to execute two or more biochemical functions. The objective of this work is to explore the relationship between multifunctional proteins, human diseases and drug targeting. The analysis of the proportion of multitasking proteins from the MultitaskProtDB-II database shows that 78% of the proteins analyzed are involved in human diseases. This percentage is much higher than the 17.9% found in human proteins in general. A similar analysis using drug target databases shows that 48% of these analyzed human multitasking proteins are targets of current drugs, while only 9.8% of the human proteins present in UniProt are specified as drug targets. In almost 50% of these proteins, both the canonical and moonlighting functions are related to the molecular basis of the disease. A procedure to identify multifunctional proteins from disease databases and a method to structurally map the canonical and moonlighting functions of the protein have also been proposed here. Both of the previous percentages suggest that multitasking is not a rare phenomenon in proteins causing human diseases, and that their detailed study might explain some collateral drug effects

    Role of Moonlighting Proteins in Disease : Analyzing the Contribution of Canonical and Moonlighting Functions in Disease Progression

    Get PDF
    The term moonlighting proteins refers to those proteins that present alternative functions performed by a single polypeptide chain acquired throughout evolution (called canonical and moonlighting, respectively). Over 78% of moonlighting proteins are involved in human diseases, 48% are targeted by current drugs, and over 25% of them are involved in the virulence of pathogenic microorganisms. These facts encouraged us to study the link between the functions of moonlighting proteins and disease. We found a large number of moonlighting functions activated by pathological conditions that are highly involved in disease development and progression. The factors that activate some moonlighting functions take place only in pathological conditions, such as specific cellular translocations or changes in protein structure. Some moonlighting functions are involved in disease promotion while others are involved in curbing it. The disease-impairing moonlighting functions attempt to restore the homeostasis, or to reduce the damage linked to the imbalance caused by the disease. The disease-promoting moonlighting functions primarily involve the immune system, mesenchyme cross-talk, or excessive tissue proliferation. We often find moonlighting functions linked to the canonical function in a pathological context. Moonlighting functions are especially coordinated in inflammation and cancer. Wound healing and epithelial to mesenchymal transition are very representative. They involve multiple moonlighting proteins with a different role in each phase of the process, contributing to the current-phase phenotype or promoting a phase switch, mitigating the damage or intensifying the remodeling. All of this implies a new level of complexity in the study of pathology genesis, progression, and treatment. The specific protein function involved in a patient's progress or that is affected by a drug must be elucidated for the correct treatment of diseases

    A hypothesis explaining why so many pathogen virulence proteins are moonlighting proteins

    Get PDF
    Moonlighting or multitasking proteins refer to those proteins with two or more functions performed by a single polypeptide chain. Proteins that belong to key ancestral functions and metabolic pathways such as primary metabolism typically exhibit moonlighting phenomenon. We have collected 698 moonlighting proteins in MultitaskProtDB-II database. A survey shows that 25% of the proteins of the database correspond to moonlighting functions related to pathogens virulence activity. Why is the canonical function of these virulence proteins mainly from ancestral key biological functions (especially of primary metabolism)? Our hypothesis is that these proteins present a high conservation between the pathogen protein and the host counterparts. Therefore, the host immune system will not elicit protective antibodies against pathogen proteins. The fact of sharing epitopes with host proteins (known as epitope mimicry) might be the cause of autoimmune diseases. Although many pathogen proteins can be antigenic, only a few of them would elicit a protective immune response. This would also explain the lack of successful vaccines based in these conserved moonlighting proteins. This review looks at why so many pathogen virulence proteins are from the primary metabolism and are conserved between pathogen and host

    MultitaskProtDB-II : an update of a database of multitasking/moonlighting proteins

    Get PDF
    Multitasking, or moonlighting, is the capability of some proteins to execute two or more biological functions. MultitaskProtDB-II is a database of multifunctional proteins that has been updated. In the previous version, the information contained was: NCBI and UniProt accession numbers, canonical and additional biological functions, organism, monomeric/oligomeric states, PDB codes and bibliographic references. In the present update, the number of entries has been increased from 288 to 694 moonlighting proteins. MultitaskProtDB-II is continually being curated and updated. The new database also contains the following information: GO descriptors for the canonical and moonlighting functions, three-dimensional structure (for those proteins lacking PDB structure, a model was made using Itasser and Phyre), the involvement of the proteins in human diseases (78% of human moonlighting proteins) and whether the protein is a target of a current drug (48% of human moonlighting proteins). These numbers highlight the importance of these proteins for the analysis and explanation of human diseases and target-directed drug design. Moreover, 25% of the proteins of the database are involved in virulence of pathogenic microorganisms, largely in the mechanism of adhesion to the host. This highlights their importance for the mechanism of microorganism infection and vaccine design. MultitaskProtDB-II is available at http://wallace.uab.es/multitaskII

    Bioinformatics Approaches to Protein Interaction and Complexes: Application to Pathogen-Host Epitope Mimicry and to Fe-S Cluster Biogenesis Model

    No full text
    Les interaccions antigen/anticòs són un dels tipus més interessants d’interaccions proteiques. La millor manera de prevenir les malalties causades per patògens és mitjançant l’ús de vacunes. L’aparició de la genòmica permet fer cerques a tot el genoma de nous candidats vacunals, tècnica anomenada vaccinologia inversa. L’estratègia més comuna on s’aplica la vaccinologia inversa és al disseny de vacunes de subunitats recombinants, que en general generen resposta immune humoral a causa de la presència d’epítops B en les proteïnes del patogen. Un problema important d’aquesta estratègia és la identificació de les proteïnes immunogèniques protectives del surfoma del patogen. El mimetisme epitòpic pot donar lloc a fenòmens autoimmunes relacionats amb diverses malalties humanes. El Capítol I d’aquesta tesi descriu una anàlisi computacional basat en la seqüència on, mitjançant l’aplicació de l’algorisme BLASTP, es van comparar bases de dades d’epítops B lineals coneguts i també de seqüències de proteïnes de superfície dels principals patògens bacterians respiratoris humans amb el proteoma humà. Es va trobar que cap dels 7353 epítops B lineals analitzats tenien regions d’identitat de seqüència amb proteïnes humanes capaces de generar anticossos i alhora que només l'1% de les 2175 proteïnes analitzades contenien alguna zona de seqüència compartida amb el proteoma humà. Aquestes troballes suggereixen l’existència d’un mecanisme per evitar l’autoimmunitat. També proposem una estratègia per corroborar o advertint sobre la viabilitat d’una proteïna que contingui un cert epítop B lineal de ser un bon candidat vacunal mitjançant estudis de vaccinologia inversa. En resum, els epítops sense cap tipus d’identitat de seqüència amb proteïnes humanes han de ser bons candidats vacunals, i al l’inrevés. El docking proteic és un mètode computacional per predir la millor manera en què interactuen les proteïnes, però, és possible identificar quina és la millor solució d’un programa de docking? La resposta habitual a aquesta pregunta és la solució que tingui més alta puntuació als outputs dels programes de docking, però les interaccions entre proteïnes són processos dinàmics, i moltes vegades la regió d’interacció és prou àmplia com per permetre diferents orientacions i/o energies d'interacció entre elles. En alguns casos, com en un multímer, es poden donar diverses regions d’interacció entre els monòmers. Aquests processos dinàmics impliquen interaccions, amb desplaçaments de superfície entre proteïnes, que porten a assolir la configuració funcional del complex proteic. Així doncs, en molts casos no hi ha una solució estàtica i única per a la interacció entre proteïnes, sinó que es donen diverses configuracions que també haurien de ser analitzades perquè podrien ser importants. Per extreure el conjunt de solucions més representatives dels outputs dels programes de docking, al Capítol II d’aquesta tesi es detalla el desenvolupament d’una aplicació de clústering no supervisada i automàtica, anomenada DockAnalyse. Aquesta aplicació es basa en el mètode ja existent de clústering DBscan, mitjançant el qual es busquen continuïtats entre els clústers generats per la representació de les dades dels outputs de docking. El mètode de clústering DBSCAN és molt robust i resol alguns dels problemes d’inconsistència dels mètodes clàssics de clústering com el tractament dels valors atípics i la dependència alhora de definir prèviament el nombre de clústers. Mitjançant representacions gràfiques i molt visuals, DockAnalyse fa que la interpretació de les solucions de docking sigui més fàcil permetent-nos trobar les més representatives. S’ha utilitzat aquesta nova aplicació per analitzar diverses interaccions proteiques i així poder modelar el comportament dinàmic de la interacció entre les proteïnes d’un complex. DockAnalyse també pot fer-se servir per a descriure regions d’interacció entre proteïnes i, per tant, orientar en futurs assajos de docking flexibles. L’aplicació (feta amb el paquet R) és oberta i accessible. La construcció dels Clústers Ferro-Sofre (ISC) en eucariotes implica interaccions entre diferents proteïnes, entre els quals es troba la proteïna Frataxina. Dèficits d'aquesta proteïna s'han associat amb excés de ferro dins del mitocondri i alteracions en la biogènesi dels ISC ja que es proposa que Frataxina actua com a donadora de ferro per a la construcció d'aquests ISC en aquest orgànul. Una reducció dràstica de Frataxina causa l'Atàxia de Friedreich, una malaltia neurodegenerativa hereditària humana que afecta principalment l'equilibri, la coordinació, els músculs i el cor. Aquest síndrome és l'atàxia autosòmica recessiva més comuna. Entre els mecanismes moleculars d' humans i de llevat que involucren Frataxina s'han trobat moltes similituds així que els llevats representen un bon model per a estudiar aquest procés. En llevat, el complex proteic que forma la plataforma central de muntatge dels passos inicials de la biogènesi dels ISC està composta per la Frataxina homòloga de llevat, el dímer Nfs1-Isd11 i la proteïna Isu. En general, està acceptat que la funció de les proteïnes implica interaccions amb altres proteïnes associades, però en aquest cas no se sap prou sobre l'estructura del complex de proteïnes i, per tant, com funciona exactament. En el Capítol III d'aquesta tesi es proposa un model del complex proteic necessari per a la biogènesi dels ISC amb el que es pretén aprofundir en detalls estructurals que expliquin la funció biològica. Per aconseguir aquest objectiu s'han utilitzat diverses eines de la bioinformàtiques, així com tècniques de modelització i programes de docking de proteïnes. Com a resultat, s'ha modelat l'estructura d'aquest complex proteic i també s'ha suggerit el comportament dinàmic dels seus components, juntament amb la dels àtoms de ferro i sofre necessaris per a la formació dels ISC. Aquestes hipòtesis podrien ajudar a comprendre millor la funció i les propietats moleculars de la proteïna Frataxina, així com els de les seves companyes presents al complex proteic.Antigen/antibody interactions are one of the most interesting kinds of protein interactions. The best way to prevent diseases caused by pathogens is by the use of vaccines. The advent of genomics enables genome-wide searches of new vaccine candidates, called reverse vaccinology. The most common strategy to apply reverse vaccinology is by designing subunit recombinant vaccines, which usually generate humoral immune response due to B-cell epitopes in proteins. A major problem for this strategy is the identification of protective immunogenic proteins from the surfome of the pathogen. Epitope mimicry may lead to auto-immune condition related to several human diseases. Chapter I of this thesis describes a sequence-based computational analysis that was carried out applying the BLASTP algorithm where databases containing the known linear B-cell epitopes and the surface-protein sequences of the main human respiratory bacterial pathogens were compared to the human proteome. We found that none of the 7353 linear B-cell epitopes analyzed share any sequence identity region with human proteins capable of generating antibodies, and that only 1% of the 2175 exposed proteins analyzed contain a stretch of shared sequence with the human proteome. These findings suggest the existence of a mechanism to avoid autoimmunity. We also propose a strategy for corroborating or warning about the viability of a protein linear B-cell epitope to be a putative vaccine candidate in reverse vaccinology studies. Therefore, epitopes without any sequence identity with human proteins should be good vaccine candidates, and the other way around. Protein docking is a computational method to predict the best way by which proteins interact, but, is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. To extract those representative solutions from the docking output datafile, Chapter II of this thesis details the development of an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible. The assembly of Iron-Sulfur Clusters (ISCs) in eukaryotes involves interactions between different proteins, among which is important the protein Frataxin. Deficits in this protein have been associated with iron inside the mitochondria and impaired ISC biogenesis as it is postulated to act as the iron donor for ISCs assembly in this organelle. A pronounced lack of Frataxin causes Friedreich's Ataxia, which is a human neurodegenerative and hereditary disease mainly affecting the equilibrium, coordination, muscles and heart. Moreover, it is the most common autosomal recessive ataxia. High similarities between the human and yeast molecular mechanisms that involve Frataxin have been suggested making yeast a good model to study that process. In yeast, the protein complex that forms the central assembly platform for the initial step of ISC biogenesis is composed by yeast Frataxin homolog, Nfs1-Isd11 and Isu. In general, it is commonly accepted that protein function involves interaction with other protein partners, but in this case not enough is known about the structure of the protein complex and, therefore, how it exactly functions. In Chapter III of this thesis a model of the ISC biogenesis protein complex was proposed in order to gain insight into structural details that could end up with its biological function. To achieve this goal several bioinformatics tools, modeling techniques and protein docking programs were used. As a result, the structure of the protein complex and the dynamic behavior of its components, along with that of the iron and sulfur atoms required for the ISC assembly, were modeled. This hypothesis might help to better understand the function and molecular properties of Frataxin as well as those of its ISC assembly protein partners

    Bioinformatics approaches to protein interaction and complexes : application to pathogen-host epitope mimicry and to Fe-S cluster biogenesis model

    Get PDF
    Les interaccions antigen/anticòs són un dels tipus més interessants d'interaccions proteiques. La millor manera de prevenir les malalties causades per patògens és mitjançant l'ús de vacunes. L'aparició de la genòmica permet fer cerques a tot el genoma de nous candidats vacunals, tècnica anomenada vaccinologia inversa. L'estratègia més comuna on s'aplica la vaccinologia inversa és al disseny de vacunes de subunitats recombinants, que en general generen resposta immune humoral a causa de la presència d'epítops B en les proteïnes del patogen. Un problema important d'aquesta estratègia és la identificació de les proteïnes immunogèniques protectives del surfoma del patogen. El mimetisme epitòpic pot donar lloc a fenòmens autoimmunes relacionats amb diverses malalties humanes. El Capítol I d'aquesta tesi descriu una anàlisi computacional basat en la seqüència on, mitjançant l'aplicació de l'algorisme BLASTP, es van comparar bases de dades d'epítops B lineals coneguts i també de seqüències de proteïnes de superfície dels principals patògens bacterians respiratoris humans amb el proteoma humà. Es va trobar que cap dels 7353 epítops B lineals analitzats tenien regions d'identitat de seqüència amb proteïnes humanes capaces de generar anticossos i alhora que només l'1% de les 2175 proteïnes analitzades contenien alguna zona de seqüència compartida amb el proteoma humà. Aquestes troballes suggereixen l'existència d'un mecanisme per evitar l'autoimmunitat. També proposem una estratègia per corroborar o advertint sobre la viabilitat d'una proteïna que contingui un cert epítop B lineal de ser un bon candidat vacunal mitjançant estudis de vaccinologia inversa. En resum, els epítops sense cap tipus d'identitat de seqüència amb proteïnes humanes han de ser bons candidats vacunals, i al l'inrevés. El docking proteic és un mètode computacional per predir la millor manera en què interactuen les proteïnes, però, és possible identificar quina és la millor solució d'un programa de docking? La resposta habitual a aquesta pregunta és la solució que tingui més alta puntuació als outputs dels programes de docking, però les interaccions entre proteïnes són processos dinàmics, i moltes vegades la regió d'interacció és prou àmplia com per permetre diferents orientacions i/o energies d'interacció entre elles. En alguns casos, com en un multímer, es poden donar diverses regions d'interacció entre els monòmers. Aquests processos dinàmics impliquen interaccions, amb desplaçaments de superfície entre proteïnes, que porten a assolir la configuració funcional del complex proteic. Així doncs, en molts casos no hi ha una solució estàtica i única per a la interacció entre proteïnes, sinó que es donen diverses configuracions que també haurien de ser analitzades perquè podrien ser importants. Per extreure el conjunt de solucions més representatives dels outputs dels programes de docking, al Capítol II d'aquesta tesi es detalla el desenvolupament d'una aplicació de clústering no supervisada i automàtica, anomenada DockAnalyse. Aquesta aplicació es basa en el mètode ja existent de clústering DBscan, mitjançant el qual es busquen continuïtats entre els clústers generats per la representació de les dades dels outputs de docking. El mètode de clústering DBSCAN és molt robust i resol alguns dels problemes d'inconsistència dels mètodes clàssics de clústering com el tractament dels valors atípics i la dependència alhora de definir prèviament el nombre de clústers. Mitjançant representacions gràfiques i molt visuals, DockAnalyse fa que la interpretació de les solucions de docking sigui més fàcil permetent-nos trobar les més representatives. S'ha utilitzat aquesta nova aplicació per analitzar diverses interaccions proteiques i així poder modelar el comportament dinàmic de la interacció entre les proteïnes d'un complex. DockAnalyse també pot fer-se servir per a descriure regions d'interacció entre proteïnes i, per tant, orientar en futurs assajos de docking flexibles. L'aplicació (feta amb el paquet R) és oberta i accessible. La construcció dels Clústers Ferro-Sofre (ISC) en eucariotes implica interaccions entre diferents proteïnes, entre els quals es troba la proteïna Frataxina. Dèficits d'aquesta proteïna s'han associat amb excés de ferro dins del mitocondri i alteracions en la biogènesi dels ISC ja que es proposa que Frataxina actua com a donadora de ferro per a la construcció d'aquests ISC en aquest orgànul. Una reducció dràstica de Frataxina causa l'Atàxia de Friedreich, una malaltia neurodegenerativa hereditària humana que afecta principalment l'equilibri, la coordinació, els músculs i el cor. Aquest síndrome és l'atàxia autosòmica recessiva més comuna. Entre els mecanismes moleculars d' humans i de llevat que involucren Frataxina s'han trobat moltes similituds així que els llevats representen un bon model per a estudiar aquest procés. En llevat, el complex proteic que forma la plataforma central de muntatge dels passos inicials de la biogènesi dels ISC està composta per la Frataxina homòloga de llevat, el dímer Nfs1-Isd11 i la proteïna Isu. En general, està acceptat que la funció de les proteïnes implica interaccions amb altres proteïnes associades, però en aquest cas no se sap prou sobre l'estructura del complex de proteïnes i, per tant, com funciona exactament. En el Capítol III d'aquesta tesi es proposa un model del complex proteic necessari per a la biogènesi dels ISC amb el que es pretén aprofundir en detalls estructurals que expliquin la funció biològica. Per aconseguir aquest objectiu s'han utilitzat diverses eines de la bioinformàtiques, així com tècniques de modelització i programes de docking de proteïnes. Com a resultat, s'ha modelat l'estructura d'aquest complex proteic i també s'ha suggerit el comportament dinàmic dels seus components, juntament amb la dels àtoms de ferro i sofre necessaris per a la formació dels ISC. Aquestes hipòtesis podrien ajudar a comprendre millor la funció i les propietats moleculars de la proteïna Frataxina, així com els de les seves companyes presents al complex proteic.Antigen/antibody interactions are one of the most interesting kinds of protein interactions. The best way to prevent diseases caused by pathogens is by the use of vaccines. The advent of genomics enables genome-wide searches of new vaccine candidates, called reverse vaccinology. The most common strategy to apply reverse vaccinology is by designing subunit recombinant vaccines, which usually generate humoral immune response due to B-cell epitopes in proteins. A major problem for this strategy is the identification of protective immunogenic proteins from the surfome of the pathogen. Epitope mimicry may lead to auto-immune condition related to several human diseases. Chapter I of this thesis describes a sequence-based computational analysis that was carried out applying the BLASTP algorithm where databases containing the known linear B-cell epitopes and the surface-protein sequences of the main human respiratory bacterial pathogens were compared to the human proteome. We found that none of the 7353 linear B-cell epitopes analyzed share any sequence identity region with human proteins capable of generating antibodies, and that only 1% of the 2175 exposed proteins analyzed contain a stretch of shared sequence with the human proteome. These findings suggest the existence of a mechanism to avoid autoimmunity. We also propose a strategy for corroborating or warning about the viability of a protein linear B-cell epitope to be a putative vaccine candidate in reverse vaccinology studies. Therefore, epitopes without any sequence identity with human proteins should be good vaccine candidates, and the other way around. Protein docking is a computational method to predict the best way by which proteins interact, but, is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. To extract those representative solutions from the docking output datafile, Chapter II of this thesis details the development of an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible. The assembly of Iron-Sulfur Clusters (ISCs) in eukaryotes involves interactions between different proteins, among which is important the protein Frataxin. Deficits in this protein have been associated with iron inside the mitochondria and impaired ISC biogenesis as it is postulated to act as the iron donor for ISCs assembly in this organelle. A pronounced lack of Frataxin causes Friedreich's Ataxia, which is a human neurodegenerative and hereditary disease mainly affecting the equilibrium, coordination, muscles and heart. Moreover, it is the most common autosomal recessive ataxia. High similarities between the human and yeast molecular mechanisms that involve Frataxin have been suggested making yeast a good model to study that process. In yeast, the protein complex that forms the central assembly platform for the initial step of ISC biogenesis is composed by yeast Frataxin homolog, Nfs1-Isd11 and Isu. In general, it is commonly accepted that protein function involves interaction with other protein partners, but in this case not enough is known about the structure of the protein complex and, therefore, how it exactly functions. In Chapter III of this thesis a model of the ISC biogenesis protein complex was proposed in order to gain insight into structural details that could end up with its biological function. To achieve this goal several bioinformatics tools, modeling techniques and protein docking programs were used. As a result, the structure of the protein complex and the dynamic behavior of its components, along with that of the iron and sulfur atoms required for the ISC assembly, were modeled. This hypothesis might help to better understand the function and molecular properties of Frataxin as well as those of its ISC assembly protein partners

    DockAnalyse : an application for the analysis of protein-protein interactions

    No full text
    Background: Is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed. Results: To extract those representative solutions from the docking output datafile, we have developed an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters. Conclusions: DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible

    Gene ontology function prediction in Mollicutes using protein-protein association networks

    No full text
    Many complex systems can be represented and analysed as networks. The recent availability of large-scale datasets, has made it possible to elucidate some of the organisational principles and rules that govern their function, robustness and evolution. However, one of the main limitations in using protein-protein interactions for function prediction is the availability of interaction data, especially for Mollicutes. If we could harness predicted interactions, such as those from a Protein-Protein Association Networks (PPAN), combining several protein-protein network function-inference methods with semantic similarity calculations, the use of protein-protein interactions for functional inference in this species would become more potentially useful. In this work we show that using PPAN data combined with other approximations, such as functional module detection, orthology exploitation methods and Gene Ontology (GO)-based information measures helps to predict protein function in Mycoplasma genitalium. To our knowledge, the proposed method is the first that combines functional module detection among species, exploiting an orthology procedure and using information theory-based GO semantic similarity in PPAN of the Mycoplasma species. The results of an evaluation show a higher recall than previously reported methods that focused on only one organism network
    corecore