755 research outputs found

    Uncertainty estimation for QSAR models using machine learning methods

    Get PDF

    Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method.

    Get PDF
    We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate, i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size. Otherwise, the error grows linearly with the relative size of the molecules

    Industry-scale application and evaluation of deep learning for drug target prediction

    Get PDF
    Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.Web of Science121art. no. 2

    Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction

    Get PDF
    The task of learning an expressive molecular representation is central to developing quantitative structure–activity and property relationships. Traditional approaches rely on group additivity rules, empirical measurements or parameters, or generation of thousands of descriptors. In this paper, we employ a convolutional neural network for this embedding task by treating molecules as undirected graphs with attributed nodes and edges. Simple atom and bond attributes are used to construct atom-specific feature vectors that take into account the local chemical environment using different neighborhood radii. By working directly with the full molecular graph, there is a greater opportunity for models to identify important features relevant to a prediction task. Unlike other graph-based approaches, our atom featurization preserves molecule-level spatial information that significantly enhances model performance. Our models learn to identify important features of atom clusters for the prediction of aqueous solubility, octanol solubility, melting point, and toxicity. Extensions and limitations of this strategy are discussed

    Quantitative structure fate relationships for multimedia environmental analysis

    Get PDF
    Key physicochemical properties for a wide spectrum of chemical pollutants are unknown. This thesis analyses the prospect of assessing the environmental distribution of chemicals directly from supervised learning algorithms using molecular descriptors, rather than from multimedia environmental models (MEMs) using several physicochemical properties estimated from QSARs. Dimensionless compartmental mass ratios of 468 validation chemicals were compared, in logarithmic units, between: a) SimpleBox 3, a Level III MEM, propagating random property values within statistical distributions of widely recommended QSARs; and, b) Support Vector Regressions (SVRs), acting as Quantitative Structure-Fate Relationships (QSFRs), linking mass ratios to molecular weight and constituent counts (atoms, bonds, functional groups and rings) for training chemicals. Best predictions were obtained for test and validation chemicals optimally found to be within the domain of applicability of the QSFRs, evidenced by low MAE and high q2 values (in air, MAE≤0.54 and q2≥0.92; in water, MAE≤0.27 and q2≥0.92).Las propiedades fisicoquímicas de un gran espectro de contaminantes químicos son desconocidas. Esta tesis analiza la posibilidad de evaluar la distribución ambiental de compuestos utilizando algoritmos de aprendizaje supervisados alimentados con descriptores moleculares, en vez de modelos ambientales multimedia alimentados con propiedades estimadas por QSARs. Se han comparado fracciones másicas adimensionales, en unidades logarítmicas, de 468 compuestos entre: a) SimpleBox 3, un modelo de nivel III, propagando valores aleatorios de propiedades dentro de distribuciones estadísticas de QSARs recomendados; y, b) regresiones de vectores soporte (SVRs) actuando como relaciones cuantitativas de estructura y destino (QSFRs), relacionando fracciones másicas con pesos moleculares y cuentas de constituyentes (átomos, enlaces, grupos funcionales y anillos) para compuestos de entrenamiento. Las mejores predicciones resultaron para compuestos de test y validación correctamente localizados dentro del dominio de aplicabilidad de los QSFRs, evidenciado por valores bajos de MAE y valores altos de q2 (en aire, MAE≤0.54 y q2≥0.92; en agua, MAE≤0.27 y q2≥0.92)

    Global Antifungal Profile Optimization of Chlorophenyl Derivatives against Botrytis cinerea and Colletotrichum gloeosporioides

    Get PDF
    Twenty-two aromatic derivatives bearing a chlorine atom and a different chain in the para or meta position were prepared and evaluated for their in vitro antifungal activity against the phytopathogenic fungi Botrytis cinerea and Colletotrichum gloeosporioides. The results showed that maximum inhibition of the growth of these fungi was exhibited for enantiomers S and R of 1-(40-chlorophenyl)- 2-phenylethanol (3 and 4). Furthermore, their antifungal activity showed a clear structure-activity relationship (SAR) trend confirming the importance of the benzyl hydroxyl group in the inhibitory mechanism of the compounds studied. Additionally, a multiobjective optimization study of the global antifungal profile of chlorophenyl derivatives was conducted in order to establish a rational strategy for the filtering of new fungicide candidates from combinatorial libraries. The MOOPDESIRE methodology was used for this purpose providing reliable ranking models that can be used later
    corecore