1,407 research outputs found

    Drug-drug interactions: A machine learning approach

    Get PDF
    Automatic detection of drug-drug interaction (DDI) is a difficult problem in pharmaco-surveillance. Recent practice for in vitro and in vivo pharmacokinetic drug-drug interaction studies have been based on carefully selected drug characteristics such as their pharmacological effects, and on drug-target networks, in order to identify and comprehend anomalies in a drug\u27s biochemical function upon co-administration.;In this work, we present a novel DDI prediction framework that combines several drug-attribute similarity measures to construct a feature space from which we train three machine learning algorithms: Support Vector Machine (SVM), J48 Decision Tree and K-Nearest Neighbor (KNN) using a partially supervised classification algorithm called Positive Unlabeled Learning (PU-Learning) tailored specifically to suit our framework.;In summary, we extracted 1,300 U.S. Food and Drug Administration-approved pharmaceutical drugs and paired them to create 1,688,700 feature vectors. Out of 397 drug-pairs known to interact prior to our experiments, our system was able to correctly identify 80% of them and from the remaining 1,688,303 pairs for which no interaction had been determined, we were able to predict 181 potential DDIs with confidence levels greater than 97%. The latter is a set of DDIs unrecognized by our source of ground truth at the time of study.;Evaluation of the effectiveness of our system involved querying the U.S. Food and Drug Administration\u27s Adverse Effect Reporting System (AERS) database for cases involving drug-pairs used in this study. The results returned from the query listed incidents reported for a number of patients, some of whom had experienced severe adverse reactions leading to outcomes such as prolonged hospitalization, diminished medicinal effect of one or more drugs, and in some cases, death

    A chemo-centric view of human health and disease

    Get PDF
    Efforts to compile the phenotypic effects of drugs and environmental chemicals offer the opportunity to adopt a chemo-centric view of human health that does not require detailed mechanistic information. Here we consider thousands of chemicals and analyse the relationship of their structures with adverse and therapeutic responses. Our study includes molecules related to the aetiology of 934 health-threatening conditions and used to treat 835 diseases. We first identify chemical moieties that could be independently associated with each phenotypic effect. Using these fragments, we build accurate predictors for approximately 400 clinical phenotypes, finding many privileged and liable structures. Finally, we connect two diseases if they relate to similar chemical structures. The resulting networks of human conditions are able to predict disease comorbidities, as well as identifying potential drug side effects and opportunities for drug repositioning, and show a remarkable coincidence with clinical observations

    Artificial intelligence methods enhance the discovery of RNA interactions

    Get PDF
    Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type

    Computational Tools for the Untargeted Assignment of FT-MS Metabolomics Datasets

    Get PDF
    Metabolomics is the study of metabolomes, the sets of metabolites observed in living systems. Metabolism interconverts these metabolites to provide the molecules and energy necessary for life processes. Many disease processes, including cancer, have a significant metabolic component that manifests as differences in what metabolites are present and in what quantities they are produced and utilized. Thus, using metabolomics, differences between metabolomes in disease and non-disease states can be detected and these differences improve our understanding of disease processes at the molecular level. Despite the potential benefits of metabolomics, the comprehensive investigation of metabolomes remains difficult. A popular analytical technique for metabolomics is mass spectrometry. Advances in Fourier transform mass spectrometry (FT-MS) instrumentation have yielded simultaneous improvements in mass resolution, mass accuracy, and detection sensitivity. In the metabolomics field, these advantages permit more complicated, but more informative experimental designs such as the use of multiple isotope-labeled precursors in stable isotope-resolved metabolomics (SIRM) experiments. However, despite these potential applications, several outstanding problems hamper the use of FT-MS for metabolomics studies. First, artifacts and data quality problems in FT-MS spectra can confound downstream data analyses, confuse machine learning models, and complicate the robust detection and assignment of metabolite features. Second, the assignment of observed spectral features to metabolites remains difficult. Existing targeted approaches for assignment often employ databases of known metabolites; however, metabolite databases are incomplete, thus limiting or biasing assignment results. Additionally, FT-MS provides limited structural information for observed metabolites, which complicates the determination of metabolite class (e.g. lipid, sugar, etc. ) for observed metabolite spectral features, a necessary step for many metabolomics experiments. To address these problems, a set of tools were developed. The first tool identifies artifacts with high peak density observed in many FT-MS spectra and removes them safely. Using this tool, two previously unreported types of high peak density artifact were identified in FT-MS spectra: fuzzy sites and partial ringing. Fuzzy sites were particularly problematic as they confused and reduced the accuracy of machine learning models trained on datasets containing these artifacts. Second, a tool called SMIRFE was developed to assign isotope-resolved molecular formulas to observed spectral features in an untargeted manner without a database of expected metabolites. This new untargeted method was validated on a gold-standard dataset containing both unlabeled and 15N-labeled compounds and was able to identify 18 of 18 expected spectral features. Third, a collection of machine learning models was constructed to predict if a molecular formula corresponds to one or more lipid categories. These models accurately predict the correct one of eight lipid categories on our training dataset of known lipid and non-lipid molecular formulas with precisions and accuracies over 90% for most categories. These models were used to predict lipid categories for untargeted SMIRFE-derived assignments in a non-small cell lung cancer dataset. Subsequent differential abundance analysis revealed a sub-population of non-small cell lung cancer samples with a significantly increased abundance in sterol lipids. This finding implies a possible therapeutic role of statins in the treatment and/or prevention of non-small cell lung cancer. Collectively these tools represent a pipeline for FT-MS metabolomics datasets that is compatible with isotope labeling experiments. With these tools, more robust and untargeted metabolic analyses of disease will be possible

    Multi-task and Multi-view Learning for Predicting Adverse Drug Reactions

    Get PDF
    Adverse drug reactions (ADRs) present a major concern for drug safety and are a major obstacle in modern drug development. They account for about one-third of all late-stage drug failures, and approximately 4% of all new chemical entities are withdrawn from the market due to severe ADRs. Although off-target drug interactions are considered to be the major causes of ADRs, the adverse reaction profile of a drug depends on a wide range of factors such as specific features of drug chemical structures, its ADME/PK properties, interactions with proteins, the metabolic machinery of the cellular environment, and the presence of other diseases and drugs. Hence computational modeling for ADRs prediction is highly complex and challenging. We propose a set of statistical learning models for effective ADRs prediction systematically from multiple perspectives. We first discuss available data sources for protein-chemical interactions and adverse drug reactions, and how the data can be represented for effective modeling. We also employ biological network analysis approaches for deeper understanding of the chemical biological mechanisms underlying various ADRs. In addition, since protein-chemical interactions are an important component for ADRs prediction, identifying these interactions is a crucial step in both modern drug discovery and ADRs prediction. The performance of common supervised learning methods for predicting protein-chemical interactions have been largely limited by insufficient availability of binding data for many proteins. We propose two multi-task learning (MTL) algorithms for jointly predicting active compounds of multiple proteins, and our methods outperform existing states of the art significantly. All these related data, methods, and preliminary results are helpful for understanding the underlying mechanisms of ADRs and further studies. ADRs data are complex and noisy, and in many cases we do not fully understand the molecular mechanisms of ADRs. Due to the noisy and heterogeneous data set available for some ADRs, we propose a sparse multi-view learning (MVL) algorithm for predicting a specific ADR - drug-induced QT prolongation, a major life-threatening adverse drug effect. It is crucial to predict the QT prolongation effect as early as possible in drug development. MVL algorithms work very well when complex data from diverse domains are involved and only limited labeled examples are available. Unlike existing MVL methods that use L2-norm co-regularization to obtain a smooth objective function, we propose an L1-norm co-regularized MVL algorithm for predicting QT prolongation, reformulate the objective function, and obtain its gradient in the analytic form. We optimize the decision functions on all views simultaneously and achieve 3-4 fold higher computational speedup, comparing to previous L2-norm co-regularized MVL methods that alternately optimizes one view with the other views fixed until convergence. L1-norm co-regularization enforces sparsity in the learned mapping functions and hence the results are expected to be more interpretable. The proposed MVL method can only predict one ADR at a time. It would be advantageous to predict multiple ADRs jointly, especially when these ADRs are highly related. Advanced modeling techniques should be investigated to better utilize ADR data for more effective ADRs prediction. We study the quantitative relationship among drug structures, drug-protein interaction profiles, and drug ADRs. We formalize the modeling problem as a multi-view (drug structure data and drug-protein interaction profile data) multi-task (one drug may cause multiple ADRs and each ADR is a task) classification problem. We apply the co-regularized MVL on each ADR and use regularized MTL to increase the total sample size and improve model performance. Experimental studies on the ADR data set demonstrate the effectiveness of our MVMT algorithm. Cluster analysis and significant feature identification using the results of our models reveal interesting hidden insight. In summary, we use computational methods such as biological network analysis, multi-task learning, multi-view learning, and inductive multi-view multi-task learning to systematically investigate the modeling of various ADRs, and construct highly accurate models for ADRs prediction. We also have significant contribution on proposing novel supervised and semi-supervised learning algorithms, which can be applied to many other real-world applications

    Representation Learning for Chemical Activity Predictions

    Full text link
    Computational prediction of a phenotypic response upon the chemical perturbation on a biological system plays an important role in drug discovery and many other applications. Chemical fingerprints derived from chemical structures are a widely used feature to build machine learning models. However, the fingerprints ignore the biological context, thus, they suffer from several problems such as the activity cliff and curse of dimensionality. Fundamentally, the chemical modulation of biological activities is a multi-scale process. It is the genome-wide chemical-target interactions that modulate chemical phenotypic responses. Thus, the genome-scale chemical-target interaction profile will more directly correlate with in vitro and in vivo activities than the chemical structure. Nevertheless, the scope of direct application of the chemical-target interaction profile is limited due to the severe incompleteness, bias, and noisiness of bioassay data. To address the aforementioned problems, we developed two new chemical and protein representation methods in this thesis. The first one is a Latent Target Interaction Profile (LTIP). LTIP embeds chemicals into a low dimensional continuous latent space that represents genome-scale chemical-target interactions. Subsequently, LTIP can be used as a feature to build machine learning models. Using the drug sensitivity of cancer cell lines as a benchmark, we have shown that the LTIP robustly outperforms chemical fingerprints regardless of machine learning algorithms. Moreover, the LTIP is complementary to the chemical fingerprints. We can combine LTIP with other fingerprints to further improve the performance of bioactivity prediction. We also developed a new protein sequence embedding method Distilled Sequence Alignment Embedding (DISAE) to represent proteins. We compared CGKronRLS to other machine learning algorithms including Random Forest and XGBoost for predicting drug-target interactions. We show how the resultant protein deep representations can be used to predict novel drug-protein pairs interactions which can improve drug safety and open many avenues for drug repurposing. Our results demonstrate the potential of LTIP in particular and multi-scale modeling in general in predictive modeling of chemical modulation of biological activities. It also shows the predictive power of DISAE which can further be improved through deep learning models

    Predicting potential drugs and drug-drug interactions for drug repositioning

    Get PDF
    The purpose of drug repositioning is to predict novel treatments for existing drugs. It saves time and reduces cost in drug discovery, especially in preclinical procedures. In drug repositioning, the challenging objective is to identify reasonable drugs with strong evidence. Recently, benefiting from various types of data and computational strategies, many methods have been proposed to predict potential drugs. Signature-based methods use signatures to describe a specific disease condition and match it with drug-induced transcriptomic profiles. For a disease signature, a list of potential drugs is produced based on matching scores. In many studies, the top drugs on the list are identified as potential drugs and verified in various ways. However, there are a few limitations in existing methods: (1) For many diseases, especially cancers, the tissue samples are often heterogeneous and multiple subtypes are involved. It is challenging to identify a signature from such a group of profiles. (2) Genes are treated as independent elements in many methods, while they may associate with each other in the given condition. (3) The disease signatures cannot identify potential drugs for personalized treatments. In order to address those limitations, I propose three strategies in this dissertation. (1) I employ clustering methods to identify sub-signatures from the heterogeneous dataset, then use a weighting strategy to concatenate them together. (2) I utilize human protein complex (HPC) information to reflect the dependencies among genes and identify an HPC signature to describe a specific type of cancer. (3) I use an HPC strategy to identify signatures for drugs, then predict a list of potential drugs for each patient. Besides predicting potential drugs directly, more indications are essential to enhance my understanding in drug repositioning studies. The interactions between biological and biomedical entities, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), help study mechanisms behind the repurposed drugs. Machine learning (ML), especially deep learning (DL), are frontier methods in predicting those interactions. Network strategies, such as constructing a network from interactions and studying topological properties, are commonly used to combine with other methods to make predictions. However, the interactions may have different functions, and merging them in a single network may cause some biases. In order to solve it, I construct two networks for two types of DDIs and employ a graph convolutional network (GCN) model to concatenate them together. In this dissertation, the first chapter introduces background information, objectives of studies, and structure of the dissertation. After that, a comprehensive review is provided in Chapter 2. Biological databases, methods and applications in drug repositioning studies, and evaluation metrics are discussed. I summarize three application scenarios in Chapter 2. The first method proposed in Chapter 3 considers the issue of identifying a cancer gene signature and predicting potential drugs. The k-means clustering method is used to identify highly reliable gene signatures. The identified signature is used to match drug profiles and identify potential drugs for the given disease. The second method proposed in Chapter 4 uses human protein complex (HPC) information to identify a protein complex signature, instead of a gene signature. This strategy improves the prediction accuracy in the experiments of cancers. Chapter 5 introduces the signature-based method in personalized cancer medicine. The profiles of a given drug are used to identify a drug signature, under the HPC strategy. Each patient has a profile, which is matched with the drug signature. Each patient has a different list of potential drugs. Chapter 6 propose a graph convolutional network with multi-kernel to predict DDIs. This method constructs two DDI kernels and concatenates them in the GCN model. It achieves higher performance in predicting DDIs than three state-of-the-art methods. In summary, this dissertation has proposed several computational algorithms for drug repositioning. Experimental results have shown that the proposed methods can achieve very good performance
    • …
    corecore