123 research outputs found

    Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

    Get PDF
    We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∟118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [Ramakrishnan et al. Sci. Data 2014, 1, 140022] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR), and two types of neural networks, graph convolutions (GC) and gated graph networks (GG). Out-of sample errors are strongly dependent on the choice of representation and regressor and molecular property. Electronic properties are typically best accounted for by MG and GC, while energetic properties are better described by HDAD and KRR. The specific combinations with the lowest out-of-sample errors in the ∟118k training set size limit are (free) energies and enthalpies of atomization (HDAD/KRR), HOMO/LUMO eigenvalue and gap (MG/GC), dipole moment (MG/GC), static polarizability (MG/GG), zero point vibrational energy (HDAD/KRR), heat capacity at room temperature (HDAD/KRR), and highest fundamental vibrational frequency (BAML/RF). We present numerical evidence that ML model predictions deviate from DFT (B3LYP) less than DFT (B3LYP) deviates from experiment for all properties. Furthermore, out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. The results suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available

    Trust in Nanotechnology? On Trust as Analytical Tool in Social Research on Emerging Technologies

    Get PDF
    Trust has become an important aspect of evaluating the relationship between lay public and technology implementation. Experiences have shown that a focus on trust provides a richer understanding of reasons for backlashes of technology in society than a mere focus of public understanding of risks and science communication. Therefore, trust is also widely used as a key concept for understanding and predicting trust or distrust in emerging technologies. But whereas trust broadens the scope for understanding established technologies with well-defined questions and controversies, it easily fails to do so with emerging technologies, where there are no shared questions, a lack of public familiarity with the technology in question, and a restricted understanding amongst social researchers as to where distrust is likely to arise and how and under which form the technology will actually be implemented. Rather contrary, ‘trust’ might sometimes even direct social research into fixed structures that makes it even more difficult for social research to provide socially robust knowledge. This article therefore suggests that if trust is to maintain its important role in evaluating emerging technologies, the approach has to be widened and initially focus not on people’s motivations for trust, but rather the object of trust it self, as to predicting how and where distrust might appear, how the object is established as an object of trust, and how it is established in relation with the public

    Artificial intelligence in biological activity prediction

    Get PDF
    Artificial intelligence has become an indispensable resource in chemoinformatics. Numerous machine learning algorithms for activity prediction recently emerged, becoming an indispensable approach to mine chemical information from large compound datasets. These approaches enable the automation of compound discovery to find biologically active molecules with important properties. Here, we present a review of some of the main machine learning studies in biological activity prediction of compounds, in particular for sweetness prediction. We discuss some of the most used compound featurization techniques and the major databases of chemical compounds relevant to these tasks.This study was supported by the European Commission through project SHIKIFACTORY100 - Modular cell factories for the production of 100 compounds from the shikimate pathway (Reference 814408), and by the Portuguese FCT under the scope of the strategic funding of UID/BIO/04469/2019 unit and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020.info:eu-repo/semantics/publishedVersio

    Beyond Implications and Applications: the Story of ‘Safety by Design’

    Get PDF
    Using long-term anthropological observations at the Center for Biological and Environmental Nanotechnology in Houston, Texas, the article demonstrates in detail the creation of new objects, new venues and new modes of veridiction which have reoriented the disciplines of materials chemistry and nanotoxicology. Beginning with the confusion surrounding the meaning of ‘implications’ and ‘applications’ the article explores the creation of new venues (CBEN and its offshoot the International Council on Nanotechnology); it then demonstrates how the demands for a responsible, safe or ethical science were translated into new research and experiment in and through these venues. Finally it shows how ‘safety by design’ emerged as a way to go beyond implications and applications, even as it introduced a whole new array of controversies concerning its viability, validity and legitimacy
    • …
    corecore