11 research outputs found

    Prediction of protein-protein interactions between viruses and human by an SVM model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several computational methods have been developed to predict protein-protein interactions from amino acid sequences, but most of those methods are intended for the interactions within a species rather than for interactions across different species. Methods for predicting interactions between homogeneous proteins are not appropriate for finding those between heterogeneous proteins since they do not distinguish the interactions between proteins of the same species from those of different species.</p> <p>Results</p> <p>We developed a new method for representing a protein sequence of variable length in a frequency vector of fixed length, which encodes the relative frequency of three consecutive amino acids of a sequence. We built a support vector machine (SVM) model to predict human proteins that interact with virus proteins. In two types of viruses, human papillomaviruses (HPV) and hepatitis C virus (HCV), our SVM model achieved an average accuracy above 80%, which is higher than that of another SVM model with a different representation scheme. Using the SVM model and Gene Ontology (GO) annotations of proteins, we predicted new interactions between virus proteins and human proteins.</p> <p>Conclusions</p> <p>Encoding the relative frequency of amino acid triplets of a protein sequence is a simple yet powerful representation method for predicting protein-protein interactions across different species. The representation method has several advantages: (1) it enables a prediction model to achieve a better performance than other representations, (2) it generates feature vectors of fixed length regardless of the sequence length, and (3) the same representation is applicable to different types of proteins.</p

    Issues in performance evaluation for host–pathogen protein interaction prediction

    Get PDF
    The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein–protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host–pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose

    Training host-pathogen protein–protein interaction predictors

    Get PDF
    Detection of protein–protein interactions (PPIs) plays a vital role in molecular biology. Particularly, pathogenic infections are caused by interactions of host and pathogen proteins. It is important to identify host–pathogen interactions (HPIs) to discover new drugs to counter infectious diseases. Conventional wet lab PPI detection techniques have limitations in terms of cost and large-scale application. Hence, computational approaches are developed to predict PPIs. This study aims to develop machine learning models to predict inter-species PPIs with a special interest in HPIs. Specifically, we focus on seeking answers to three questions that arise while developing an HPI predictor: (1) How should negative training examples be selected? (2) Does assigning sample weights to individual negative examples based on their similarity to positive examples improve generalization performance? and, (3) What should be the size of negative samples as compared to the positive samples during training and evaluation? We compare two available methods for negative sampling: random versus DeNovo sampling and our experiments show that DeNovo sampling offers better accuracy. However, our experiments also show that generalization performance can be improved further by using a soft DeNovo approach that assigns sample weights to negative examples inversely proportional to their similarity to known positive examples during training. Based on our findings, we have also developed an HPI predictor called HOPITOR (Host-Pathogen Interaction Predictor) that can predict interactions between human and viral proteins. The HOPITOR web server can be accessed at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#HoPItor

    Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed

    Get PDF
    Background: Host-parasite protein interactions (HPPI) are those interactions occurring between a parasite and its host. Host-parasite protein interaction enhances the understanding of how parasite can infect its host. The interaction plays an important role in initiating infections, although it is not all host-parasite interactions that result in infection. Identifying the protein-protein interactions (PPIs) that allow a parasite to infect its host has a lot do in discovering possible drug targets. Such PPIs, when altered, would prevent the host from being infected by the parasite and in some cases, result in the parasite inability to complete specific stages of its life cycle and invariably lead to the death of such parasite. It therefore becomes important to understand the workings of host-parasite interactions which are the major causes of most infectious diseas

    Exploring The Interactions Between SARS-CoV-2 and Host Proteins.

    Get PDF
    The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of the current pandemic, Coronavirus Disease 2019 (COVID-19). SARS-CoV-2 is considered to be of zoonotic origin; it originated in non-human animals and was transmitted to humans. Since the early stage of the pandemic, however, the evidence of transmissions from humans to animals (reverse zoonoses) has been found in multiple animal species including mink, white-tailed deer, and pet and zoo animals. Furthermore, secondary zoonotic events of SARS-CoV-2, transmissions from animals to humans, have been also reported. It is suggested that non-human hosts can act as SARS-CoV-2 reservoirs where accumulated mutations in viral proteins could change the transmissibility and/or pathogenicity of the virus when it is spilled over again to human populations. Our goal, therefore, is to examine the SARS-CoV-2 genomic changes in non-human hosts and to identify the changes responsible for the adaptation of the virus in non-human hosts. Changes in the physicochemical properties of viral proteins potentially affect and influence their functions. Therefore, in this study, we compared SARS-CoV-2 proteins among human and non-human hosts and analyzed the differences in their physicochemical properties using the principal component analysis. In addition to the viral proteins from bat and pangolin, those from white-tailed deer and mink showed larger differences in the properties. Van der Waals volume, isoelectric point, charge, and thermostability index were found to be the main contributing factors. We next performed the comparisons of protein-protein interaction (PPI) prediction methods that use different features including physicochemical properties and those based on natural language processing. It showed that the Cross-attention PHV had slightly better performance scores than InterSPPI-HVPPI and LGCA-VHPPI. Finally, to examine the effect of changes in physicochemical properties in viral proteins against host proteins, PPI prediction was performed using the Cross-attention PHV between viral proteins from different SARS-CoV-2 variants and host proteins. The prediction scores between the different variants and host proteins from human and white-tailed deer were highly similar. The results showed that the analysis of physicochemical properties of viral proteins helps to understand how physicochemical properties of viral proteins affect viral-host PPIs and how viral proteins evolve to adapt different host cell environments

    Joint learning from multiple information sources for biological problems

    Get PDF
    Thanks to technological advancements, more and more biological data havebeen generated in recent years. Data availability offers unprecedented opportunities to look at the same problem from multiple aspects. It also unveils a more global view of the problem that takes into account the intricated inter-play between the involved molecules/entities. Nevertheless, biological datasets are biased, limited in quantity, and contain many false-positive samples. Such challenges often drastically downgrade the performance of a predictive model on unseen data and, thus, limit its applicability in real biological studies. Human learning is a multi-stage process in which we usually start with simple things. Through the accumulated knowledge over time, our cognition ability extends to more complex concepts. Children learn to speak simple words before being able to formulate sentences. Similarly, being able to speak correct sentences supports our learning to speak correct and meaningful paragraphs, etc. Generally, knowledge acquired from related learning tasks would help boost our learning capability in the current task. Motivated by such a phenomenon, in this thesis, we study supervised machine learning models for bioinformatics problems that can improve their performance through exploiting multiple related knowledge sources. More specifically, we concern with ways to enrich the supervised models’ knowledge base with publicly available related data to enhance the computational models’ prediction performance. Our work shares commonality with existing works in multimodal learning, multi-task learning, and transfer learning. Nevertheless, there are certain differences in some cases. Besides the proposed architectures, we present large-scale experiment setups with consensus evaluation metrics along with the creation and release of large datasets to showcase our approaches’ superiority. Moreover, we add case studies with detailed analyses in which we place no simplified assumptions to demonstrate the systems’ utilities in realistic application scenarios. Finally, we develop and make available an easy-to-use website for non-expert users to query the model’s generated prediction results to facilitate field experts’ assessments and adaptation. We believe that our work serves as one of the first steps in bridging the gap between “Computer Science” and “Biology” that will open a new era of fruitful collaboration between computer scientists and biological field experts

    Modélisation prédictive des interactions entre bactéries et virus bactériophages

    Get PDF
    Actuellement, il existe un grave problĂšme de santĂ© publique dĂ» au fait que les bactĂ©ries dĂ©veloppent des rĂ©sistances aux antibiotiques, notamment Ă  cause de la surconsommation d’antibiotiques. AchetĂ©s en pharmacie, consommĂ© dans les hĂŽpitaux ou indirectement via la nourriture que l’ĂȘtre humain consomme tous les jours, la consommation de ceux-ci ne cesse de s’accroitre. La phagothĂ©rapie, ou le traitement par bactĂ©riophages est une alternative prometteuse aux antibiotiques, qui consiste Ă  utiliser des virus « mangeurs » de bactĂ©ries pour soigner diverses infections d’origine bactĂ©rienne. Cette technique de soins possĂšde plusieurs des avantages des antibiotiques sans ses inconvĂ©nients, puisque les bactĂ©riophages sont trĂšs spĂ©cifiques et ne s’attaquent par consĂ©quent qu’aux bactĂ©ries Ă  l’origine de l’infection, Ă©vitant ainsi les effets secondaires dĂ» Ă  la consommation d’antibiotiques par exemple sur la flore intestinale. Le dĂ©fi liĂ© Ă  cette technique consiste Ă  identifier rapidement le ou les bactĂ©riophages capables d’attaquer une bactĂ©rie en particulier, une procĂ©dure actuellement rĂ©alisĂ©e en laboratoire en testant toutes les combinaisons possibles, ce qui est coĂ»teux et nĂ©cessite plusieurs jours. La solution explorĂ©e dans ce projet consiste en l’utilisation de techniques computationnelles pour prĂ©dire in silico si une paire bactĂ©rie-bactĂ©riophage est capable d’interagir ou pas. Parti d’une base de donnĂ©es contenant plus de 1'000 paires bactĂ©rie-bactĂ©riophage positives et plus de 1'000 paires nĂ©gatives pour lesquelles le gĂ©nome de la bactĂ©rie et du bactĂ©riophage sont connus, la procĂ©dure suivante a Ă©tĂ© mise en place: 1. Extraction de variables pour crĂ©er 19 sets de donnĂ©es utilisĂ©s pour entraĂźner les modĂšles d’apprentissage automatique ; 2. SĂ©lection et entrainement des algorithmes avec un grand nombre de configurations; 3. Recours Ă  l’approche d’agrĂ©gation de modĂšle pour Ă©laborer un systĂšme de votation ; 4. Analyse des rĂ©sultats. Le modĂšle final qui a Ă©tĂ© dĂ©veloppĂ© a permis d’atteindre une performance de plus de 90% d’accuracy, de mesure F1, de sensibilitĂ© et de spĂ©cificitĂ© sur un set de validation (test set) qui n’avait jamais Ă©tĂ© utilisĂ© ni pour l’entraĂźnement ni pour la validation croisĂ©e. Les bons rĂ©sultats permettent d’affirmer que l’utilisation de l’apprentissage automatique semble ĂȘtre une approche prometteuse pour rĂ©pondre Ă  ce problĂšme.Currently, there is a serious public health problem because bacteria develop resistance to antibiotics, particularly because of the overuse of antibiotics. Purchased in pharmacies, consumed in hospitals or indirectly via the food that humans consume daily, the consumption of these continues to increase. Phage therapy, i.e. treatment with bacteriophages, is a promising alternative to antibiotics, which involves the use of viruses, which are literally "eaters" of bacteria, to treat various infections caused by bacteria. This treatment technique has several of the advantages of antibiotics, without their drawbacks. Indeed, bacteriophages are highly specific and therefore only attack bacteria causing the infection, avoiding side effects due to antibiotics consumption, e.g. on the intestinal flora. The challenge of this technique is to quickly identify the bacteriophages that attack a particular bacterium, a procedure currently performed in laboratories by testing all possible combinations, which is expensive and requires several days. The solution explored in this project is the use of computational techniques to predict whether a pair of bacteriophage-bacterium is able to interact or not in silico. For a database containing more than 1,000 positive pairs of bacteria-bacteriophage and over 1,000 negative pairs for which the genome of both the bacterium and the bacteriophage are known, the following procedure has been put in place: 1. Extraction of features to create 19 datasets used to train machine learning models; 2. Selection and training of the algorithms with a large number of configurations; 3. Use of ensemble-learning modeling approaches to develop a voting system; 4. Results analysis. The final model that was developed has achieved a performance of more than 90% accuracy, measurement F1, sensitivity and specificity on a validation set (test set) that had never been used for training nor for cross-validation. These good results let us conclude that the use of machine learning seems to be a promising approach to address this problem

    Machine-learning-based identification of factors that influence molecular virus-host interactions

    Get PDF
    Viruses are the cause of many infectious diseases such as the pandemic viruses: acquired immune deficiency syndrome (AIDS) and coronavirus disease 2019 (COVID-19). During the infection cycle, viruses invade host cells and trigger a series of virus-host interactions with different directionality. Some of these interactions disrupt host immune responses or promote the expression of viral proteins and exploitation of the host system thus are considered ‘pro-viral’. Some interactions display ‘pro-host’ traits, principally the immune response, to control or inhibit viral replication. Concomitant pro-viral and pro-host molecular interactions on the same host molecule suggests more complex virus-host conflicts and genetic signatures that are crucial to host immunity. In this work, machinelearning-based prediction of virus-host interaction directionality was examined by using data from Human immunodeficiency virus type 1 (HIV-1) infection. Host immune responses to viral infections are mediated by interferons(IFNs) in the initial stage of the immune response to infection. IFNs induce the expression of many IFN-stimulated genes (ISGs), which make the host cell refractory to further infection. We propose that there are many features associated with the up-regulation of human genes in the context of IFN-α stimulation. They make ISGs predictable using machine-learning models. In order to overcome the interference of host immune responses for successful replication, viruses adopt multiple strategies to avoid being detected by cellular sensors in order to hijack the machinery of host transcription or translation. Here, the strategy of mimicry of host-like short linear motifs (SLiMs) by the virus was investigated by using the example of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The integration of in silico experiments and analyses in this thesis demonstrates an interactive and intimate relationship between viruses and their hosts. Findings here contribute to the identification of host dependency and antiviral factors. They are of great importance not only to the ongoing COVID-19 pandemic but also to the understanding of future disease outbreaks
    corecore