36,518 research outputs found

    Co-complex protein membership evaluation using Maximum Entropy on GO ontology and InterPro annotation.

    Get PDF
    MOTIVATION: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. RESULTS: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. AVAILABILITY AND IMPLEMENTATION: https://github.com/ima23/maxent-ppi. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Detecting Adverse Drug Events Using a Deep neural network Model

    Get PDF
    Adverse drug events represent a key challenge in public health, especially with respect to drug safety profiling and drug surveillance. Drug-drug interactions represent one of the most popular types of adverse drug events. Most computational approaches to this problem have used different types of drug-related information utilizing different types of machine learning algorithms to predict potential interactions between drugs. In this work, our focus is on the use of genetic information about the drugs, in particular, the protein sequence and protein structure of drug protein targets to predict potential interactions between drugs. We collected information on drug-drug interactions (DDIs) from the DrugBank database and divided them into multiple datasets based on the type of information, such as, chemical structure, protein targets, side effects, pathways, protein-protein interactions, protein structure, information about indications. We proposed a similarity-based Neural Network framework called protein sequence-structure similarity network (S3N), and used this to predict the novel DDI’s. The drug-drug similarities are computed using different categories of drug information based on multiple similarity metrics. We compare the results with those from the state-of-the art methods on this problem. Our results show that proposed method is quite competitive, at times outperforming the state-of-the-art. Our performance evaluations on different datasets showed the predictive performance as follows: Precision 91\%-98\%, Recall 90\%-96\%, F1 Score 86\%-95\%, AUC 88\%-99\% Accuracy 86\%-95\%. To further investigate the reliability of the proposed method, we utilize 158 drugs related to cardiovascular disease to evaluate the performance of our model and find out the new interactions among the drugs. Our model showed 90\% accuracy of detecting the existing drug interactions and identified 60 new DDI’s for the cardiovascular drugs. Our evaluation demonstrates the effectiveness of S3N in predicting DDI’s

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Prediction of protein-protein interactions using one-class classification methods and integrating diverse data

    Get PDF
    This research addresses the problem of prediction of protein-protein interactions (PPI) when integrating diverse kinds of biological information. This task has been commonly viewed as a binary classification problem (whether any two proteins do or do not interact) and several different machine learning techniques have been employed to solve this task. However the nature of the data creates two major problems which can affect results. These are firstly imbalanced class problems due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly the selection of negative examples can be based on some unreliable assumptions which could introduce some bias in the classification results. Here we propose the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilise examples of just one class to generate a predictive model which consequently is independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We have designed and carried out a performance evaluation study of several OCC methods for this task, and have found that the Parzen density estimation approach outperforms the rest. We also undertook a comparative performance evaluation between the Parzen OCC method and several conventional learning techniques, considering different scenarios, for example varying the number of negative examples used for training purposes. We found that the Parzen OCC method in general performs competitively with traditional approaches and in many situations outperforms them. Finally we evaluated the ability of the Parzen OCC approach to predict new potential PPI targets, and validated these results by searching for biological evidence in the literature

    Classification of protein interaction sentences via gaussian processes

    Get PDF
    The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption
    corecore