67 research outputs found

    Data mining methods for the prediction of intestinal absorption using QSAR

    Get PDF
    Oral administration is the most common route for administration of drugs. With the growing cost of drug discovery, the development of Quantitative Structure-Activity Relationships (QSAR) as computational methods to predict oral absorption is highly desirable for cost effective reasons. The aim of this research was to develop QSAR models that are highly accurate and interpretable for the prediction of oral absorption. In this investigation the problems addressed were datasets with unbalanced class distributions, feature selection and the effects of solubility and permeability towards oral absorption prediction. Firstly, oral absorption models were obtained by overcoming the problem of unbalanced class distributions in datasets using two techniques, under-sampling of compounds belonging to the majority class and the use of different misclassification costs for different types of misclassifications. Using these methods, models with higher accuracy were produced using regression and linear/non-linear classification techniques. Secondly, the use of several pre-processing feature selection methods in tandem with decision tree classification analysis – including misclassification costs – were found to produce models with better interpretability and higher predictive accuracy. These methods were successful to select the most important molecular descriptors and to overcome the problem of unbalanced classes. Thirdly, the roles of solubility and permeability in oral absorption were also investigated. This involved expansion of oral absorption datasets and collection of in vitro and aqueous solubility data. This work found that the inclusion of predicted and experimental solubility in permeability models can improve model accuracy. However, the impact of solubility on oral absorption prediction was not as influential as expected. Finally, predictive models of permeability and solubility were built to predict a provisional Biopharmaceutic Classification System (BCS) class using two multi-label classification techniques, binary relevance and classifier chain. The classifier chain method was shown to have higher predictive accuracy by using predicted solubility as a molecular descriptor for permeability models, and hence better final provisional BCS prediction. Overall, this research has resulted in predictive and interpretable models that could be useful in a drug discovery context

    Prediction of Human Intestinal Absorption by GA Feature Selection and Support Vector Machine Regression

    Get PDF
    QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius2 and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to select proper descriptors. A Kohonen's self-organizing Neural Network (KohNN) map was used to split the whole dataset into a training set including 380 compounds and a test set consisting of 172 compounds. First, the six selected descriptors from ADRIANA.Code and the six selected descriptors from Cerius2 were used as the input descriptors for building quantitative models using Partial Least Square (PLS) analysis and Support Vector Machine (SVM) Regression. Then, another two models were built based on nine descriptors selected by a combination of ADRIANA.Code and Cerius2 descriptors using PLS and SVM, respectively. For the three SVM models, correlation coefficients (r) of 0.87, 0.89 and 0.88 were achieved; and standard deviations (s) of 10.98, 9.72 and 9.14 were obtained for the test set

    Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

    Get PDF
    Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; EspañaFil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; EspañaFil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; Españ

    Review of QSAR Models and Software Tools for predicting Biokinetic Properties

    Get PDF
    In the assessment of industrial chemicals, cosmetic ingredients, and active substances in pesticides and biocides, metabolites and degradates are rarely tested for their toxicologcal effects in mammals. In the interests of animal welfare and cost-effectiveness, alternatives to animal testing are needed in the evaluation of these types of chemicals. In this report we review the current status of various types of in silico estimation methods for Absorption, Distribution, Metabolism and Excretion (ADME) properties, which are often important in discriminating between the toxicological profiles of parent compounds and their metabolites/degradation products. The review was performed in a broad sense, with emphasis on QSARs and rule-based approaches and their applicability to estimation of oral bioavailability, human intestinal absorption, blood-brain barrier penetration, plasma protein binding, metabolism and. This revealed a vast and rapidly growing literature and a range of software tools. While it is difficult to give firm conclusions on the applicability of such tools, it is clear that many have been developed with pharmaceutical applications in mind, and as such may not be applicable to other types of chemicals (this would require further research investigation). On the other hand, a range of predictive methodologies have been explored and found promising, so there is merit in pursuing their applicability in the assessment of other types of chemicals and products. Many of the software tools are not transparent in terms of their predictive algorithms or underlying datasets. However, the literature identifies a set of commonly used descriptors that have been found useful in ADME prediction, so further research and model development activities could be based on such studies.JRC.DG.I.6-Systems toxicolog

    Incorporating physiologically relevant mobile phases in micellar liquid chromatography for the prediction of human intestinal absorption

    Get PDF
    Micellar liquid chromatography (MLC) is a popular method used in the determination of a compounds lipophilicity. This study describes the use of the obtained micelle/water partition coefficient (log Pmw) by such a method in the prediction of human intestinal absorption (HIA). As a result of the close resemblance of the novel composition of the micellar mobile phase to that of physiological intestinal fluid, prediction was deemed to be highly successful. The unique micellar mobile phase consisted of a mixed micellar mixture of lecithin and six bile salts, i.e. a composition matching that found in the human intestinal environment, prepared in ratios resembling those in the intestine. This is considered to be the first method to use a physiological mixture of biosurfactants in the prediction of HIA. As a result, a mathematical model with high predictive ability (R2PRED= 81 %) was obtained using multiple linear regression. The micelle/water partition coefficient (log Pmw) obtained from MLC was found to be a successful tool for prediction where the final optimum model included (log Pmw) and polar surface area (PSA) as key descriptors with high statistical significance for the prediction of HIA. This can be attributed to the nature of the mobile phase used in this study which contains the lecithin-bile salt complex, thus forming a bilayer system therefore mimicking absorption across the intestinal membrane

    Machine learning approach in pharmacokinetics and toxicity prediction

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Kernel Functions for Graph Classification

    Get PDF
    Graphs are information-rich structures, but their complexity makes them difficult to analyze. Given their broad and powerful representation capacity, the classification of graphs has become an intense area of research. Many established classifiers represent objects with vectors of explicit features. When the number of features grows, however, these vector representations suffer from typical problems of high dimensionality such as overfitting and high computation time. This work instead focuses on using kernel functions to map graphs into implicity defined spaces that avoid the difficulties of vector representations. The introduction of kernel classifiers has kindled great interest in kernel functions for graph data. By using kernels the problem of graph classification changes from finding a good classifier to finding a good kernel function. This work explores several novel uses of kernel functions for graph classification. The first technique is the use of structure based features to add structural information to the kernel function. A strength of this approach is the ability to identify specific structure features that contribute significantly to the classification process. Discriminative structures can then be passed off to domain-specific researchers for additional analysis. The next approach is the use of wavelet functions to represent graph topology as simple real-valued features. This approach achieves order-of-magnitude decreases in kernel computation time by eliminating costly topological comparisons, while retaining competitive classification accuracy. Finally, this work examines the use of even simpler graph representations and their utility for classification. The models produced from the kernel functions presented here yield excellent performance with respect to both efficiency and accuracy, as demonstrated in a variety of experimental studies

    Development of in silico models for the prediction of toxicity incorporating ADME information

    Get PDF
    Drug discovery is a process that requires a significant investment in both time and resources. Although recent developments have reduced the number of drugs failing at the later stages of development due to poor pharmacokinetic and/or toxicokinetic profiles, late stage attrition of drug candidates remains a problem. Additionally, there is a need to reduce animal testing for toxicological risk assessment for ethical and financial reasons. In silico methods offer an alternative that can address these challenges. A variety of computational approaches have been developed in the last two decades, these must be evaluated to ensure confidence in their use. The research presented in this thesis has assessed a range of existing tools for the prediction of toxicity and absorption, distribution, metabolism and elimination (ADME) parameters with an emphasis on absorption and xenobiotic metabolism. These two ADME properties largely determine bioavailability of a drug and, in turn, also influence toxicity. In vitro (Caco-2 cells and the parallel artificial membrane permeation assay) and in silico approaches, such as various druglikeness filters, can be used to estimate human intestinal absorption; a comparison between different methods was performed to identify relative strengths and weaknesses of the approaches. In terms of xenobiotic metabolism it is not only important to predict metabolites correctly, but it is also crucial to identify those compounds that can be biotransformed into species that can covalently bind to biomolecules. Structural alerts are routinely used to screen for such potential reactive metabolites. The balance between sensitivity and specificity of such reactive metabolite alerts has been discussed in the context of correctly predicting reactive metabolites of pharmaceuticals (using data available from DrugBank). Off-target toxicity, exemplified by human Ether-à-go-go-Related Gene (hERG) channel inhibition, was also explored. A number of novel structural alerts for hERG toxicity were developed based on groups of structurally similar compounds. Finally, the importance of predicting potential ecotoxicological effects of drugs was also considered. The utility of zebrafish embryos to distinguish between baseline and excess toxicity was investigated. In evaluating this selection of existing tools, improvements to the methods have been proposed where possible
    corecore