66 research outputs found

    Ab initio machine learning in chemical compound space

    Get PDF
    Chemical compound space (CCS), the set of all theoretically conceivable combinations of chemical elements and (meta-)stable geometries that make up matter, is colossal. The first principles based virtual sampling of this space, for example in search of novel molecules or materials which exhibit desirable properties, is therefore prohibitive for all but the smallest sub-sets and simplest properties. We review studies aimed at tackling this challenge using modern machine learning techniques based on (i) synthetic data, typically generated using quantum mechanics based methods, and (ii) model architectures inspired by quantum mechanics. Such Quantum mechanics based Machine Learning (QML) approaches combine the numerical efficiency of statistical surrogate models with an {\em ab initio} view on matter. They rigorously reflect the underlying physics in order to reach universality and transferability across CCS. While state-of-the-art approximations to quantum problems impose severe computational bottlenecks, recent QML based developments indicate the possibility of substantial acceleration without sacrificing the predictive power of quantum mechanics

    Review of QSAR Models and Software Tools for predicting Biokinetic Properties

    Get PDF
    In the assessment of industrial chemicals, cosmetic ingredients, and active substances in pesticides and biocides, metabolites and degradates are rarely tested for their toxicologcal effects in mammals. In the interests of animal welfare and cost-effectiveness, alternatives to animal testing are needed in the evaluation of these types of chemicals. In this report we review the current status of various types of in silico estimation methods for Absorption, Distribution, Metabolism and Excretion (ADME) properties, which are often important in discriminating between the toxicological profiles of parent compounds and their metabolites/degradation products. The review was performed in a broad sense, with emphasis on QSARs and rule-based approaches and their applicability to estimation of oral bioavailability, human intestinal absorption, blood-brain barrier penetration, plasma protein binding, metabolism and. This revealed a vast and rapidly growing literature and a range of software tools. While it is difficult to give firm conclusions on the applicability of such tools, it is clear that many have been developed with pharmaceutical applications in mind, and as such may not be applicable to other types of chemicals (this would require further research investigation). On the other hand, a range of predictive methodologies have been explored and found promising, so there is merit in pursuing their applicability in the assessment of other types of chemicals and products. Many of the software tools are not transparent in terms of their predictive algorithms or underlying datasets. However, the literature identifies a set of commonly used descriptors that have been found useful in ADME prediction, so further research and model development activities could be based on such studies.JRC.DG.I.6-Systems toxicolog

    Artificial intelligence and chemical kinetics enabled property-oriented fuel design for internal combustion engine

    Get PDF
    Fuel Genome Project aims at addressing the forward problem of fuel property prediction and the inverse problems of molecule design, retrosynthesis and reaction condition prediction. This work primarily addresses the forward problem by integrating feature engineering theory, artificial intelligence (AI) technologies, gas-phase chemical kinetics. Group contribution method (GCM) is utilized to establish the GCM-UOB (University of Birmingham) 1.0 system with 22 molecular descriptors and the surrogate formulation is to minimize the difference of functional group fragments between target fuel and surrogate. The improved QSPR (quantitative structure–activity relationship)-UOB 2.0 system with 32 molecular features couples with machine learning (ML) algorithms to establish the regression models for fuel ignition quality prediction. QSPR-UOB 3.0 scheme expands to 42 molecular descriptors to improve the molecular resolution of aromatics and specific fuel types. The obtained structural features combining with ML algorithms enable to predict 15 physicochemical properties with high fidelity and efficiency. In addition to the technical route of ML-QSPR models, another route of deep learning-convolution neural network (DL-CNN) is proposed for property prediction and yield sooting index (YSI) is taken as a case study. The predicted accuracy of DL-CNN is inferior to the ML-QSPR model at its current status, but its benefit of automated feature extraction and rapid advance in classification problem make it a promising solution for regression problem. A high-throughput fuel screening is performed to identify the molecules with desired properties for both spark ignition (SI) and compression ignition (CI) engines which contains the Tier 1 physicochemical properties screening (based on the ML-QSPR models) and Tier 2 chemical kinetic screening (based on the detailed chemical mechanisms). Polyoxymethylene dimethyl ether 3 (PODE3) and diethoxymethane (DEM) are promising carbon-neutral fuels for CI engines and they are recommended by the virtual screening results. Their ignition delay time, laminar flame speed and dominant reactions of PODE3 and DEM are examined by chemical kinetics and a new DEM mechanism including both low and high-temperature reactions is constructed. Concluding remarks and research prospects are summarized in the final section

    Quantum Chemistry Calculations for Metabolomics

    Get PDF
    A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials (“standards”), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for “standards-free” identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples.A primary goal of metabolomics studies is to fully characterize the small-molecule composition of complex biological and environmental samples. However, despite advances in analytical technologies over the past two decades, the majority of small molecules in complex samples are not readily identifiable due to the immense structural and chemical diversity present within the metabolome. Current gold-standard identification methods rely on reference libraries built using authentic chemical materials (“standards”), which are not available for most molecules. Computational quantum chemistry methods, which can be used to calculate chemical properties that are then measured by analytical platforms, offer an alternative route for building reference libraries, i.e., in silico libraries for “standards-free” identification. In this review, we cover the major roadblocks currently facing metabolomics and discuss applications where quantum chemistry calculations offer a solution. Several successful examples for nuclear magnetic resonance spectroscopy, ion mobility spectrometry, infrared spectroscopy, and mass spectrometry methods are reviewed. Finally, we consider current best practices, sources of error, and provide an outlook for quantum chemistry calculations in metabolomics studies. We expect this review will inspire researchers in the field of small-molecule identification to accelerate adoption of in silico methods for generation of reference libraries and to add quantum chemistry calculations as another tool at their disposal to characterize complex samples

    Development of quantitative structure property relationships to support non-target LC-HRMS screening

    Get PDF
    Κατά την τελευταία δεκαετία, ένας μεγάλος αριθμός αναδυόμενων ρύπων έχουν ανιχνευθεί και ταυτοποιηθεί σε επιφανειακά ύδατα και λύματα, προκαλώντας ανησυχία για το υδάτινο οικοσύστημα, λόγω της πιθανής χημικής τους σταθερότητας. Η τεχνική της υγροχρωματογραφίας - φασματομετρίας μάζας υψηλής διακριτικής ικανότητας (LC-HRMS) αποτελεί μια αποτελεσματική τεχνική για την ανίχνευση αναδυόμενων ρύπων στο περιβάλλον. Η ταυτόχρονη δε ανάλυση των δειγμάτων με τις συμπληρωματικές τεχνικές της υγροχρωματογραφίας αντίστροφης φάσης (RPLC) και της υγροχρωματογραφίας υδρόφιλων αλληλεπιδράσεων (HILIC), συντελεί στην ταυτοποίηση «ύποπτων» ή και άγνωστων ρύπων με ποικίλες φυσικοχημικές ιδιότητες. Για την ταυτοποίηση τους, απαιτείται να πληρούνται συγκεκριμένα κριτήρια, τα οποία αξιολογούνται με βάση τη χρήση διαγνωστικών εργαλείων, όπως η ακριβής πρόβλεψη του χρόνου ανάσχεσης, η in silico θραυσματοποίηση και η πρόβλεψη της συμπεριφορά τους στον ιοντισμό. Στο 3ο κεφάλαιο της παρούσας διδακτορικής διατριβής περιγράφεται η ανάπτυξη μιας ολοκληρωμένης πορείας εργασίας (workflow) για τη διερεύνηση των παραμέτρων που επηρεάζουν τον χρόνο έκλουσης μεγάλου αριθμού ενώσεων που συγκαταλέγονται στους αναδυόμενους ρύπους. Για τον σκοπό αυτό, πάνω από 2.500 αναδυόμενοι ρύποι χρησιμοποιήθηκαν για την ανάπτυξη του μοντέλου πρόβλεψης χρόνου ανάσχεσης για τις 2 υγροχρωματογραφικές τεχνικές (RP- και HILIC-LC-HRMS) και για ηλεκτροψεκασμό τόσο σε θετικό όσο και σε αρνητικό ιοντισμό (+/-ESI). Στη συνέχεια, πραγματοποιήθηκε εφαρμογή του μοντέλου για την υπολογιστική πρόβλεψη του χρόνου ανάσχεσης, για την ταυτοποίηση 10 νέων προϊόντων μετασχματισμού των φαρμακευτικών ενώσεων (tramadol, furosemide και niflumic acid) ύστερα από επεξεργασία με όζον. Στο 4ο κεφάλαιο παρουσιάζεται η ανάπτυξη ενός καινοτόμου γενικευμένου χημειομετρικού μοντέλου το οποίο είναι ικανό να προβλέπει τον χρόνο έκλουσης κάθε πιθανού ρύπου, ανεξαρτήτου υγροχρωματογραφικής μεθόδου που χρησιμοποιείται, συμβάλλοντας σημαντικά στην σύγκριση αποτελεσμάτων από διαφορετικές LC-HRMS μεθόδους. Το συγκεκριμένο μοντέλο χρησιμοποιήθηκε για την ταυτοποίηση «ύποπτων» και άγνωστων ενώσεων σε διεργαστηριακές δοκιμές. Το Κεφάλαιο 5, περιέχει την περιγραφή της ανάπτυξης ενός υπολογιστικού μοντέλου πρόβλεψης τοξικότητας αναδυόμενων ρύπων που ανιχνεύονται στο υδάτινο οικοσύστημα. Το συγκεκριμένο μοντέλο αποσκοπεί στην εκτίμηση του πιθανού περιβαλλοντικού κινδύνου για νέες ενώσεις που ταυτοποιήθηκαν μέσω σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης, για τις οποίες δεν είναι ακόμα διαθέσιμα πειραματικά δεδομένα τοξικότητας. Τέλος, στο κεφάλαιο 6 παρουσιάζεται ένας αυτοματοποιημένος και συστηματικός τρόπος σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης σε δεδομένα από LC-HRMS. Η νέα αυτή αυτοματοποιημένη πορεία εργασίας, αποσκοπεί στην λιγότερο χρονοβόρα επεξεργασία των HRMS δεδομένων, και στην εφαρμογή της μη-στοχευμένης σάρωσης ώστε να είναι δυνατή η εφαρμογή τους σε καθημερινούς ελέγχους ρουτίνας ή/και για χρήση από τις κανονιστικές αρχές.Over the last decade, a high number of emerging contaminants were detected and identified in surface and waste waters that could threaten the aquatic environment due to their pseudo-persistence. As it is described in chapters 1 and 2, liquid chromatography high resolution mass spectroscopy (LC-HRMS) can be used as an efficient tool for their screening. Simultaneously screening of these samples by hydrophilic interaction liquid chromatography (HILIC) and reversed phase (RP) would help with full identification of suspects and unknown compounds. However, to confirm the identity of the most relevant suspect or unknown compounds, their chemical properties such as retention time behavior, MSn fragmentation and ionization modes should be investigated. Chapter 3 of this thesis discusses the development of a comprehensive workflow to study the retention time behavior of large groups of compounds belonging to emerging contaminants. A dataset consisted of more than 2500 compounds was used for RP/HILIC-LC-HRMS, and their retention times were derived in both Electrospray Ionization mode (+/-ESI). These in silico approaches were then applied on the identification of 10 new transformation products of tramadol, furosemide and niflumic acid (under ozonation treatment). Chapter 4 discusses about the development of a first retention time index system for LC-HRMS. Some practical applications of this RTI system in suspect and non-target screening in collaborative trials have been presented as well. Chapter 5 describes the development of in silico based toxicity models to estimate the acute toxicity of emerging pollutants in the aquatic environment. This would help link the suspect/non-target screening results to the tentative environmental risk by predicting the toxicity of newly tentatively identified compounds. Chapter 6 introduces an automatic and systematic way to perform suspect and non-target screening in LC-HRMS data. This would save time and the data analysis loads and enable the routine application of non-target screening for regulatory or monitoring purpose

    Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications

    Get PDF
    The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin
    corecore