17,611 research outputs found

    Drug-drug interactions: A machine learning approach

    Get PDF
    Automatic detection of drug-drug interaction (DDI) is a difficult problem in pharmaco-surveillance. Recent practice for in vitro and in vivo pharmacokinetic drug-drug interaction studies have been based on carefully selected drug characteristics such as their pharmacological effects, and on drug-target networks, in order to identify and comprehend anomalies in a drug\u27s biochemical function upon co-administration.;In this work, we present a novel DDI prediction framework that combines several drug-attribute similarity measures to construct a feature space from which we train three machine learning algorithms: Support Vector Machine (SVM), J48 Decision Tree and K-Nearest Neighbor (KNN) using a partially supervised classification algorithm called Positive Unlabeled Learning (PU-Learning) tailored specifically to suit our framework.;In summary, we extracted 1,300 U.S. Food and Drug Administration-approved pharmaceutical drugs and paired them to create 1,688,700 feature vectors. Out of 397 drug-pairs known to interact prior to our experiments, our system was able to correctly identify 80% of them and from the remaining 1,688,303 pairs for which no interaction had been determined, we were able to predict 181 potential DDIs with confidence levels greater than 97%. The latter is a set of DDIs unrecognized by our source of ground truth at the time of study.;Evaluation of the effectiveness of our system involved querying the U.S. Food and Drug Administration\u27s Adverse Effect Reporting System (AERS) database for cases involving drug-pairs used in this study. The results returned from the query listed incidents reported for a number of patients, some of whom had experienced severe adverse reactions leading to outcomes such as prolonged hospitalization, diminished medicinal effect of one or more drugs, and in some cases, death

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Full text link
    Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

    Text and data mining for human drug understanding

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.This research employs text and data mining methods to gain valuable knowledge for human drugs. Specifically, computational methods are developed for three topics, namely drug-side-effect prediction, drug-target identification, and drug-drug-interaction detection. The key innovations of the proposed methods lie in the feature space construction using medical domain knowledge, generation of reliable negative samples, and successful application of machine learning algorithms. The drug-side-effect prediction problems are studied in Chapters 3-5. Side-effects are secondary phenotypic responses of human organisms to drug treatments. Side-effect prediction is an important topic for drugs especially in post-marketing surveillance because they cause significant fatality and severe morbidity. To overcome the limitations of existing computational methods such as lack of proper drug representation and reliable negative samples, this thesis presents three novel methods. The first method is to predict side-effects for single drug medication as described in Chapter 3. A comprehensive drug similarity framework is developed by integrating several types of similarities measured by representative features of drugs first. Then reliable negative samples are generated through analyzing the comprehensive drug similarities. Trained with generated reliable negatives, the prediction performance of four classical classifiers are improved significantly, outperforming those state-of-the-art methods. Chapter 4 describes the method proposed to predict side-effects for combined medication of multi-drugs. A scoring method on a drug-disease-gene tripartite network is developed to prioritize interacting drugs, paving a way to generate credible negative samples for side-effect prediction of combined medication. It creatively characterized a drug with its chemical structures, target proteins, substituents, and enriched pathways. The drug-drug pairs are represented as novel feature vectors to train binary classifiers for prediction. This novel representation and the inferred negative samples contribute to the superior performance of the proposed method in drug-drug-side-effect association prediction. Chapter 5 introduces the last method for detecting adverse drug reactions (ADRs, i.e., side-effects) from medical forums. It filters the cause-result relationship between drugs and ADRs using a self-built dictionary and detects drug-ADRs associations by information entropy. Compared with conventional co-occurrence based methods, the proposed method captures both high-frequency and low-frequency ADRs simultaneously. Besides, it returns drug-related ADRs only owing to the self-built relation dictionary. Drug-target identification plays a crucial role in drug discovery. Existing computational methods have achieved remarkable prediction accuracy, however, usually obtain poor prediction efficiency due to computational problems. Chapter 6 presents a method to improve the prediction efficiency using an advanced technique named anchor graph hashing (AGH). AGH embeds data into low-dimensional Hamming space while maintaining the neighbourship. It turns the drug-target identification problem into a binary classification task where inputs are AGH-embedded vectors of drug-target pairs, and labels are judgments of their associations. Ensemble learning with random forest and XGBoost is employed to learn a good decision boundary. The proposed method is demonstrated to be the most efficient method and achieves comparable prediction accuracy with the best literature method. Chapter 7 introduces a novel positive-unlabeled learning method named DDI-PULearn for large-scale detection of drug-drug interactions (DDIs). DDI-PULearn first generates seeds of reliable negatives via OCSVM (one-class support vector machine) under a high-recall constraint and via the cosine-similarity based KNN (k-nearest neighbors) as well. Then trained with all the labeled positives (i.e., validated DDIs) and the generated seed negatives, DDI-PULearn employs an iterative SVM to identify the set of entire reliable negatives from the unlabeled samples. The identified negatives and validated positives are represented as vectors using the bit-wise similarity of corresponding drug pairs to train random forest for prediction. Its excellent performance is confirmed by comparing with two baseline methods and five state-of-the-art methods

    Recent advances in in silico target fishing

    Get PDF
    In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

    Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    Get PDF
    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63-0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited

    Overlap Matrix Completion for Predicting Drug-Associated Indications

    Get PDF
    Identification of potential drug-associated indications is critical for either approved or novel drugs in drug repositioning. Current computational methods based on drug similarity and disease similarity have been developed to predict drug-disease associations. When more reliable drug- or disease-related information becomes available and is integrated, the prediction precision can be continuously improved. However, it is a challenging problem to effectively incorporate multiple types of prior information, representing different characteristics of drugs and diseases, to identify promising drug-disease associations. In this study, we propose an overlap matrix completion (OMC) for bilayer networks (OMC2) and tri-layer networks (OMC3) to predict potential drug-associated indications, respectively. OMC is able to efficiently exploit the underlying low-rank structures of the drug-disease association matrices. In OMC2, first of all, we construct one bilayer network from drug-side aspect and one from disease-side aspect, and then obtain their corresponding block adjacency matrices. We then propose the OMC2 algorithm to fill out the values of the missing entries in these two adjacency matrices, and predict the scores of unknown drug-disease pairs. Moreover, we further extend OMC2 to OMC3 to handle tri-layer networks. Computational experiments on various datasets indicate that our OMC methods can effectively predict the potential drug-disease associations. Compared with the other state-of-the-art approaches, our methods yield higher prediction accuracy in 10-fold cross-validation and de novo experiments. In addition, case studies also confirm the effectiveness of our methods in identifying promising indications for existing drugs in practical applications

    Virtual Affinity Fingerprints for Target Fishing: A New Application of Drug Profile Matching

    Get PDF
    We recently introduced Drug Profile Matching (DPM), a novel virtual affinity fingerprinting bioactivity prediction method. DPM is based on the docking profiles of ca. 1200 FDA-approved small-molecule drugs against a set of nontarget proteins and creates bioactivity predictions based on this pattern. The effectiveness of this approach was previously demonstrated for therapeutic effect prediction of drug molecules. In the current work, we investigated the applicability of DPM for target fishing, i.e. for the prediction of biological targets for compounds. Predictions were made for 77 targets, and their accuracy was measured by Receiver Operating Characteristic (ROC) analysis. Robustness was tested by a rigorous 10-fold cross-validation procedure. This procedure identified targets (N = 45) with high reliability based on DPM performance. These 45 categories were used in a subsequent study which aimed at predicting the off-target profiles of currently approved FDA drugs. In this data set, 79% of the known drug-target interactions were correctly predicted by DPM, and additionally 1074 new drug-target interactions were suggested. We focused our further investigation on the suggested interactions of antipsychotic molecules and confirmed several interactions by a review of the literature
    corecore