56 research outputs found

    A Machine Learning Approach for the Identification of a Treatment against Chagas Disease

    Get PDF
    In this final degree project we have presented a machine learning approach to predict the biological activity of FDA approved drugs against T. cruzi. We believe that the proposed methodology will expand the state-of-art of machine learning in the Chagas disease drug discovery pipeline. We have obtained similar performance results with the work presented in but applied only to FDA approved drugs as a repurposing strategy. A final contribution of this work is the biological evaluation provided by the metabolic pathway analysis. This evaluation allows us to map FDA approved drugs onto T. cruzi metabolic pathways. This validation is useful because it incorporates important informa tion of how the drugs target T. cruzi. Finding a subset of drugs that come up from differently motivated experiments is promising. The fact that among our results are drugs that already have been tested in the past against Chagas disease is encouraging evidence that our approaches are able to produce reasonable candidates for drug repurposing. Additionally, the majority of the drugs present in our results were never tested against T. cruzi, confirming the novelty of our approaches.CONACYT – Consejo Nacional de Ciencia y TecnologíaPROCIENCI

    Modelling Cellular Permeability via Carrier Mediated Transport

    Get PDF
    The relative importance of passive diffusion and carrier mediated transport processes to membrane permeability of drugs is a subject of current debate. Passive diffusion and carrier mediated transport are the two main methods by which drugs permeate the cell membrane. The permeability of molecules through membranes can have an impact on their absorption, distribution, metabolism and excretion (ADME) properties. It is therefore important to be able to predict the extent to which novel molecules can permeate the cell membrane. In vitro models of human intestinal absorption can be used to predict the likelihood of molecules permeating the human intestinal epithelium. Quantitative structure activity relationships (QSAR) techniques explain the relationship between molecular structure and cellular permeability. Current QSAR methods make use of physicochemical and structural property descriptors. These descriptors are able to predict the membrane permeability of molecules via passive diffusion rather than via membrane transporters. The aim of this study was to develop novel descriptors of carrier mediated transport that can be used in the development of QSAR models of permeability. The concept of metabolite likeness was investigated for its utility as a measure of the likelihood of molecules undergoing carrier mediated transport. This investigation found that approved drugs are generally more similar to human endogenous metabolites than molecules found in commercial databases. The use of a protein target prediction tool, PIDGIN, was also investigated. This study found that a relatively small number of membrane transporters that are expressed in caco-2 cells have models available in PIDGIN. New QSAR models of membrane permeability were developed using physicochemical and structural property descriptors and in combination with the novel descriptors of carrier mediated transport. Novel models for predicting drug efflux ratio were developed and perform well in validation tests. Comparisons of predictive performance between QSAR models generated from physicochemical property descriptors alone and in combination with ‘carrier-mediated transport descriptors’ were carried out. The general observation was that the novel descriptors of carrier mediated transport pursued did not significantly improve the predictive performance of models. However, some substructures from the MACCS keys list, which are relevant to protein binding, were found to be important determinants of caco-2 permeability of molecules and could potentially be used to identify molecules that may undergo active transport. The performance of logistic regression classification models of efflux ratio was 88%. Not many studies have developed QSAR models of efflux ratio. This is a relatively novel approach which could be useful in identifying, and thus help to avoid, potential substrates of efflux transporters in drug discovery

    NOVEL ALGORITHMS AND TOOLS FOR LIGAND-BASED DRUG DESIGN

    Get PDF
    Computer-aided drug design (CADD) has become an indispensible component in modern drug discovery projects. The prediction of physicochemical properties and pharmacological properties of candidate compounds effectively increases the probability for drug candidates to pass latter phases of clinic trials. Ligand-based virtual screening exhibits advantages over structure-based drug design, in terms of its wide applicability and high computational efficiency. The established chemical repositories and reported bioassays form a gigantic knowledgebase to derive quantitative structure-activity relationship (QSAR) and structure-property relationship (QSPR). In addition, the rapid advance of machine learning techniques suggests new solutions for data-mining huge compound databases. In this thesis, a novel ligand classification algorithm, Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), was reported for the prediction of diverse categorical pharmacological properties. LiCABEDS was successfully applied to model 5-HT1A ligand functionality, ligand selectivity of cannabinoid receptor subtypes, and blood-brain-barrier (BBB) passage. LiCABEDS was implemented and integrated with graphical user interface, data import/export, automated model training/ prediction, and project management. Besides, a non-linear ligand classifier was proposed, using a novel Topomer kernel function in support vector machine. With the emphasis on green high-performance computing, graphics processing units are alternative platforms for computationally expensive tasks. A novel GPU algorithm was designed and implemented in order to accelerate the calculation of chemical similarities with dense-format molecular fingerprints. Finally, a compound acquisition algorithm was reported to construct structurally diverse screening library in order to enhance hit rates in high-throughput screening

    A review on machine learning approaches and trends in drug discovery

    Get PDF
    Abstract: Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.Instituto de Salud Carlos III; PI17/01826Instituto de Salud Carlos III; PI17/01561Xunta de Galicia; Ref. ED431D 2017/16Xunta de Galicia; Ref. ED431D 2017/23Xunta de Galicia; Ref. ED431C 2018/4

    Development and application of QSAR models for mechanisms related to endocrine disruption.

    Get PDF

    Automated de novo metabolite identification with mass spectrometry and cheminformatics

    Get PDF
    In this thesis new algorithms and methods that enable the de novo identification of metabolites have been developed. The aim was to find methods to propose candidate structures for unknown metabolites using MSn data as starting point. These methods have been integrated into a semi-automated pipeline to identify new human metabolites. The discovery of new metabolites will improve our capability to understand disease via its metabolic fingerprint, to develop personalized treatments and to discover new drugs. In addition, the cheminformatics methods presented in this thesis increase our understanding on the properties of human metabolites. The research described in this thesis has shown that the success of de novo metabolite identification relies on the synergy between analytical chemistry methods (i.e. LC-MSn) and cheminformatics tools.Netherlands Organization for Applied Scientific Research (TNO) Netherlands Metabolomics CentreUBL - phd migration 201

    Data Science techniques for predicting plant genes involved in secondary metabolites production

    Get PDF
    Masters of SciencePlant genome analysis is currently experiencing a boost due to reduced costs associated with the development of next generation sequencing technologies. Knowledge on genetic background can be applied to guide targeted plant selection and breeding, and to facilitate natural product discovery and biological engineering. In medicinal plants, secondary metabolites are of particular interest because they often represent the main active ingredients associated with health-promoting qualities. Plant polyphenols are a highly diverse family of aromatic secondary metabolites that act as antimicrobial agents, UV protectants, and insect or herbivore repellents. Most of the genome mining tools developed to understand genetic materials have very seldom addressed secondary metabolite genes and biosynthesis pathways. Little significant research has been conducted to study key enzyme factors that can predict a class of secondary metabolite genes from polyketide synthases. The objectives of this study were twofold: Primarily, it aimed to identify the biological properties of secondary metabolite genes and the selection of a specific gene, naringenin-chalcone synthase or chalcone synthase (CHS). The study hypothesized that data science approaches in mining biological data, particularly secondary metabolite genes, would enable the compulsory disclosure of some aspects of secondary metabolite (SM). Secondarily, the aim was to propose a proof of concept for classifying or predicting plant genes involved in polyphenol biosynthesis from data science techniques and convey these techniques in computational analysis through machine learning algorithms and mathematical and statistical approaches. Three specific challenges experienced while analysing secondary metabolite datasets were: 1) class imbalance, which refers to lack of proportionality among protein sequence classes; 2) high dimensionality, which alludes to a phenomenon feature space that arises when analysing bioinformatics datasets; and 3) the difference in protein sequences lengths, which alludes to a phenomenon that protein sequences have different lengths. Considering these inherent issues, developing precise classification models and statistical models proves a challenge. Therefore, the prerequisite for effective SM plant gene mining is dedicated data science techniques that can collect, prepare and analyse SM genes
    • …
    corecore