Search CORE

7 research outputs found

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: Antony J. Williams
Daniel M. Lowe
Igor V. Tetko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826

Springer - Publisher Connector

PubMed Central

PuSH

FigShare

Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

Author: Barzilay Regina
Coley Connor Wilson
Green Jr William H
Jaakkola Tommi S
Jensen Klavs F
Publication venue: 'American Chemical Society (ACS)'
Publication date: 01/10/2016
Field of study

The task of learning an expressive molecular representation is central to developing quantitative structure–activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed

DSpace@MIT

Crossref

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: A Bender
A Jain
A Varnek
A Varnek
Antony J. Williams
B Bhhatarai
B Üstün
C Steinbeck
C-C Chang
CA Bergstrom
D Rogers
Daniel M. Lowe
DM Lowe
DM Lowe
DS Palmer
F Nigsch
GW Bemis
H Zhu
I Sushko
I Sushko
I Sushko
I Sushko
Igor V. Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
IV Tetko
J Gasteiger
JC Dearden
JS Delaney
KA Chu
KV Balakin
L Breiman
L Hawizy
LD Hughes
LH Hall
MI Skvortsova
MS Dunn
N Haider
R Todeschini
RS Boethling
S Bauerschmidt
S Novotarskyi
S Vorberg
SB Kotsiantis
SE Manahan
VA Potemkin
VA Potemkin
VA Potemkin
Y Ran
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Melting Point and Pyrolysis Point Data for Tens of Thousands of Chemicals

Author: Antony Williams (96443)
Daniel Lowe (401476)
Igor Tetko (1262478)
Publication venue
Publication date
Field of study

We have developed a pipeline for the automated extraction and annotation of chemical data from published patents. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). <div><br></div><div>Two data sets are associated with the resulting publication authored by Tetko et al. "<b>The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from patents</b>". Details are on Kudos at https://www.growkudos.com/publications/10.1186%252Fs13321-016-0113-y</div

FigShare

MOESM2 of The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: Antony Williams (155)
Daniel Lowe (401476)
Igor Tetko (1262478)
Publication venue
Publication date
Field of study

Additional file 2: Table S1. RMSE of LibSVM models calculated with different sets of descriptors

FigShare

From Knowledgebases to Toxicity Prediction and Promiscuity Assessment

Author: Siramshetty Vishal Babu
Publication venue
Publication date: 01/01/2019
Field of study

Polypharmacology marked a paradigm shift in drug discovery from the traditional ‘one drug, one target’ approach to a multi-target perspective, indicating that highly effective drugs favorably modulate multiple biological targets. This ability of drugs to show activity towards many targets is referred to as promiscuity, an essential phenomenon that may as well lead to undesired side-effects. While activity at therapeutic targets provides desired biological response, toxicity often results from non-specific modulation of off-targets. Safety, efficacy and pharmacokinetics have been the primary concerns behind the failure of a majority of candidate drugs. Computer-based (in silico) models that can predict the pharmacological and toxicological profiles complement the ongoing efforts to lower the high attrition rates. High-confidence bioactivity data is a prerequisite for the development of robust in silico models. Additionally, data quality has been a key concern when integrating data from publicly-accessible bioactivity databases. A majority of the bioactivity data originates from high- throughput screening campaigns and medicinal chemistry literature. However, large numbers of screening hits are considered false-positives due to a number of reasons. In stark contrast, many compounds do not demonstrate biological activity despite being tested in hundreds of assays. This thesis work employs cheminformatics approaches to contribute to the aforementioned diverse, yet highly related, aspects that are crucial in rationalizing and expediting drug discovery. Knowledgebase resources of approved and withdrawn drugs were established and enriched with information integrated from multiple databases. These resources are not only useful in small molecule discovery and optimization, but also in the elucidation of mechanisms of action and off- target effects. In silico models were developed to predict the effects of small molecules on nuclear receptor and stress response pathways and human Ether-à-go-go-Related Gene encoded potassium channel. Chemical similarity and machine-learning based methods were evaluated while highlighting the challenges involved in the development of robust models using public domain bioactivity data. Furthermore, the true promiscuity of the potentially frequent hitter compounds was identified and their mechanisms of action were explored at the molecular level by investigating target-ligand complexes. Finally, the chemical and biological spaces of the extensively tested, yet inactive, compounds were investigated to reconfirm their potential to be promising candidates.Die Polypharmakologie beschreibt einen Paradigmenwechsel von "einem Wirkstoff - ein Zielmolekül" zu "einem Wirkstoff - viele Zielmoleküle" und zeigt zugleich auf, dass hochwirksame Medikamente nur durch die Interaktion mit mehreren Zielmolekülen Ihre komplette Wirkung entfalten können. Hierbei ist die biologische Aktivität eines Medikamentes direkt mit deren Nebenwirkungen assoziiert, was durch die Interaktion mit therapeutischen bzw. Off-Targets erklärt werden kann (Promiskuität). Ein Ungleichgewicht dieser Wechselwirkungen resultiert oftmals in mangelnder Wirksamkeit, Toxizität oder einer ungünstigen Pharmakokinetik, anhand dessen man das Scheitern mehrerer potentieller Wirkstoffe in ihrer präklinischen und klinischen Entwicklungsphase aufzeigen kann. Die frühzeitige Vorhersage des pharmakologischen und toxikologischen Profils durch computergestützte Modelle (in-silico) anhand der chemischen Struktur kann helfen den Prozess der Medikamentenentwicklung zu verbessern. Eine Voraussetzung für die erfolgreiche Vorhersage stellen zuverlässige Bioaktivitätsdaten dar. Allerdings ist die Datenqualität oftmals ein zentrales Problem bei der Datenintegration. Die Ursache hierfür ist die Verwendung von verschiedenen Bioassays und „Readouts“, deren Daten zum Großteil aus primären und bestätigenden Bioassays gewonnen werden. Während ein Großteil der Treffer aus primären Assays als falsch-positiv eingestuft werden, zeigen einige Substanzen keine biologische Aktivität, obwohl sie in beiden Assay- Typen ausgiebig getestet wurden (“extensively assayed compounds”). In diese Arbeit wurden verschiedene chemoinformatische Methoden entwickelt und angewandt, um die zuvor genannten Probleme zu thematisieren sowie Lösungsansätze aufzuzeigen und im Endeffekt die Arzneimittelforschung zu beschleunigen. Hierfür wurden nicht redundante, Hand-validierte Wissensdatenbanken für zugelassene und zurückgezogene Medikamente erstellt und mit weiterführenden Informationen angereichert, um die Entdeckung und Optimierung kleiner organischer Moleküle voran zu treiben. Ein entscheidendes Tool ist hierbei die Aufklärung derer Wirkmechanismen sowie Off-Target-Interaktionen. Für die weiterführende Charakterisierung von Nebenwirkungen, wurde ein Hauptaugenmerk auf Nuklearrezeptoren, Pathways in welchen Stressrezeptoren involviert sind sowie den hERG-Kanal gelegt und mit in-silico Modellen simuliert. Die Erstellung dieser Modelle wurden Mithilfe eines integrativen Ansatzes aus “state-of-the-art” Algorithmen wie Ähnlichkeitsvergleiche und “Machine- Learning” umgesetzt. Um ein hohes Maß an Vorhersagequalität zu gewährleisten, wurde bei der Evaluierung der Datensätze explizit auf die Datenqualität und deren chemische Vielfalt geachtet. Weiterführend wurden die in-silico-Modelle dahingehend erweitert, das Substrukturfilter genauer betrachtet wurden, um richtige Wirkmechanismen von unspezifischen Bindungsverhalten (falsch- positive Substanzen) zu unterscheiden. Abschließend wurden der chemische und biologische Raum ausgiebig getesteter, jedoch inaktiver, kleiner organischer Moleküle (“extensively assayed compounds”) untersucht und mit aktuell zugelassenen Medikamenten verglichen, um ihr Potenzial als vielversprechende Kandidaten zu bestätigen

Institutional Repository of the Freie Universität Berlin

Recommended from our members

Prediction of Human Intestinal Absorption

Author: Patel Raj B.
Patel Raj B.
Publication venue: The University of Arizona.
Publication date: 01/01/2017
Field of study

The proposed human intestinal absorption prediction model is applied to over 900 pharmaceuticals and has about 82.5% true prediction power. This study will provide a screening tool that can differentiate well absorbed and poorly absorbed drugs in the early stage of drug discovery and development. This model is based on fundamental physicochemical properties and can be applied to virtual compounds. The maximum well-absorbed dose (i.e., the maximum dose that will be more than 50 percent absorbed) calculated using this model can be utilized as a guideline for drug design, synthesis, and pre-clinical studies.Release after 22-Dec-201

The University of Arizona