12 research outputs found

    In Silico Prediction of Physicochemical Properties

    Get PDF
    This report provides a critical review of computational models, and in particular(quantitative) structure-property relationship (QSPR) models, that are available for the prediction of physicochemical properties. The emphasis of the review is on the usefulness of the models for the regulatory assessment of chemicals, particularly for the purposes of the new European legislation for the Registration, Evaluation, Authorisation and Restriction of CHemicals (REACH), which entered into force in the European Union (EU) on 1 June 2007. It is estimated that some 30,000 chemicals will need to be further assessed under REACH. Clearly, the cost of determining the toxicological and ecotoxicological effects, the distribution and fate of 30,000 chemicals would be enormous. However, the legislation makes it clear that testing need not be carried out if adequate data can be obtained through information exchange between manufacturers, from in vitro testing, and from in silico predictions. The effects of a chemical on a living organism or on its distribution in the environment is controlled by the physicochemical properties of the chemical. Important physicochemical properties in this respect are, for example, partition coefficient, aqueous solubility, vapour pressure and dissociation constant. Whilst all of these properties can be measured, it is much quicker and cheaper, and in many cases just as accurate, to calculate them by using dedicated software packages or by using (QSPRs). These in silico approaches are critically reviewed in this report.JRC.I.3-Toxicology and chemical substance

    Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications

    Get PDF
    The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin

    Artificial intelligence and chemical kinetics enabled property-oriented fuel design for internal combustion engine

    Get PDF
    Fuel Genome Project aims at addressing the forward problem of fuel property prediction and the inverse problems of molecule design, retrosynthesis and reaction condition prediction. This work primarily addresses the forward problem by integrating feature engineering theory, artificial intelligence (AI) technologies, gas-phase chemical kinetics. Group contribution method (GCM) is utilized to establish the GCM-UOB (University of Birmingham) 1.0 system with 22 molecular descriptors and the surrogate formulation is to minimize the difference of functional group fragments between target fuel and surrogate. The improved QSPR (quantitative structure–activity relationship)-UOB 2.0 system with 32 molecular features couples with machine learning (ML) algorithms to establish the regression models for fuel ignition quality prediction. QSPR-UOB 3.0 scheme expands to 42 molecular descriptors to improve the molecular resolution of aromatics and specific fuel types. The obtained structural features combining with ML algorithms enable to predict 15 physicochemical properties with high fidelity and efficiency. In addition to the technical route of ML-QSPR models, another route of deep learning-convolution neural network (DL-CNN) is proposed for property prediction and yield sooting index (YSI) is taken as a case study. The predicted accuracy of DL-CNN is inferior to the ML-QSPR model at its current status, but its benefit of automated feature extraction and rapid advance in classification problem make it a promising solution for regression problem. A high-throughput fuel screening is performed to identify the molecules with desired properties for both spark ignition (SI) and compression ignition (CI) engines which contains the Tier 1 physicochemical properties screening (based on the ML-QSPR models) and Tier 2 chemical kinetic screening (based on the detailed chemical mechanisms). Polyoxymethylene dimethyl ether 3 (PODE3) and diethoxymethane (DEM) are promising carbon-neutral fuels for CI engines and they are recommended by the virtual screening results. Their ignition delay time, laminar flame speed and dominant reactions of PODE3 and DEM are examined by chemical kinetics and a new DEM mechanism including both low and high-temperature reactions is constructed. Concluding remarks and research prospects are summarized in the final section

    Prediction of the physical properties of pure chemical compounds through different computational methods.

    Get PDF
    Ph. D. University of KwaZulu-Natal, Durban 2014.Liquid thermal conductivities, viscosities, thermal decomposition temperatures, electrical conductivities, normal boiling point temperatures, sublimation and vaporization enthalpies, saturated liquid speeds of sound, standard molar chemical exergies, refractive indices, and freezing point temperatures of pure organic compounds and ionic liquids are important thermophysical properties needed for the design and optimization of products and chemical processes. Since sufficiently purification of pure compounds as well as experimentally measuring their thermophysical properties are costly and time consuming, predictive models are of great importance in engineering. The liquid thermal conductivity of pure organic compounds was the first investigated property, in this study, for which, a general model, a quantitative structure property relationship, and a group contribution method were developed. The novel gene expression programming mathematical strategy [1, 2], firstly introduced by our group, for development of non-linear models for thermophysical properties, was successfully implemented to develop an explicit model for determination of the thermal conductivity of approximately 1600 liquids at different temperatures but atmospheric pressure. The statistical parameters of the obtained correlation show about 9% absolute average relative deviation of the results from the corresponding DIPPR 801 data [3]. It should be mentioned that the gene expression programing technique is a complicated mathematical algorithm and needs a significant computer power and this is the largest databases of thermophysical property that has been successfully managed by this strategy. The quantitative structure property relationship was developed using the sequential search algorithm and the same database used in previous step. The model shows the average absolute relative deviation (AARD %), standard deviation error, and root mean square error of 7.4%, 0.01, and 0.01 over the training, validation and test sets, respectively. The database used in previous sections was used to develop a group contribution model for liquid thermal conductivity. The statistical analysis of the performance of the obtained model shows approximately a 7.1% absolute average relative deviation of the results from the corresponding DIPPR 801 [4] data. In the next stage, an extensive database of viscosities of 443 ionic liquids was initially compiled from literature (more than 200 articles). Then, it was employed to develop a group contribution model. Using this model, a training set composed of 1336 experimental data was correlated with a low AARD% of about 6.3. A test set consists of 336 data point was used to validate this model. It shows an AARD% of 6.8 for the test set. In the next part of this study, an extensive database of thermal decomposition temperature of 586 ionic liquids was compiled from literature. Then, it was used to develop a quantitative structure property relationship. The proposed quantitative structure property relationship produces an acceptable average absolute relative deviation (AARD) of less than 5.2 % taking into consideration all 586 experimental data values. The updated database of thermal decomposition temperature including 613 ionic liquids was subsequently used to develop a group contribution model. Using this model, a training set comprised of 489 data points was correlated with a low AARD of 4.5 %. A test set consisting of 124 data points was employed to test its capability. The model shows an AARD of 4.3 % for the test set. Electrical conductivity of ionic liquids was the next property investigated in this study. Initially, a database of electrical conductivities of 54 ionic liquids was collected from literature. Then, it was used to develop two models; a quantitative structure property relationship and a group contribution model. Since the electrical conductivities of ionic liquids has a complicated temperature- and chemical structure- dependency, the least square support vector machines strategy was used as a non-linear regression tool to correlate the electrical conductivity of ionic liquids. The deviation of the quantitative structure property relationship from the 783 experimental data used in its development (training set) is 1.8%. The validity of the model was then evaluated using another experimental data set comprising 97 experimental data (deviation: 2.5%). Finally, the reproducibility and reliability of the model was successfully assessed using the last experimental dataset of 97 experimental data (deviation: 2.7%). Using the group contribution model, a training set composed of 863 experimental data was correlated with a low AARD of about 3.1% from the corresponding experimental data. Then, the model was validated using a data set composed of 107 experimental data points with a low AARD of 3.6%. Finally, a test set consists of 107 data points was used for its validation. It shows an AARD of 4.9% for the test set. In the next stage, the most comprehensive database of normal boiling point temperatures of approximately 18000 pure organic compounds was provided and used to develop a quantitative structure property relationship. In order to develop the model, the sequential search algorithm was initially used to select the best subset of molecular descriptors. In the next step, a three-layer feed forward artificial neural network was used as a regression tool to develop the final model. It seems that this is the first time that the quantitative structure property relationship technique has successfully been used to handle a large database as large as the one used for normal boiling point temperatures of pure organic compounds. Generally, handling large databases of compounds has always been a challenge in quantitative structure property relationship world due to the handling large number of chemical structures (particularly, the optimization of the chemical structures), the high demand of computational power and very high percentage of failures of the software packages. As a result, this study is regarded as a long step forward in quantitative structure property relationship world. A comprehensive database of sublimation enthalpies of 1269 pure organic compounds at 298.15 K was successfully compiled from literature and used to develop an accurate group contribution. The model is capable of predicting the sublimation enthalpies of organic compounds at 298.15 K with an acceptable average absolute relative deviation between predicted and experimental values of 6.4%. Vaporization enthalpies of organic compounds at 298.15 K were also studied in this study. An extensive database of 2530 pure organic compounds was used to develop a comprehensive group contribution model. It demonstrates an acceptable %AARD of 3.7% from experimental data. Speeds of sound in saturated liquid phase was the next property investigated in this study. Initially, A collection of 1667 experimental data for 74 pure chemical compounds were extracted from the ThermoData Engine of National Institute of Standards and Technology [5]. Then, a least square support vector machines-group contribution model was developed. The model shows a low AARD% of 0.5% from the corresponding experimental data. In the next part of this study, a simple group contribution model was presented for the prediction of the standard molar chemical exergy of pure organic compounds. It is capable of predicting the standard chemical exergy of pure organic compounds with an acceptable average absolute relative deviation of 1.6% from the literature data of 133 organic compounds. The largest ever reported databank for refractive indices of approximately 12 000 pure organic compounds was initially provided. A novel computational scheme based on coupling the sequential search strategy with the genetic function approximation (GFA) strategy was used to develop a model for refractive indices of pure organic compounds. It was determined that the strategy can have both the capabilities of handling large databases (the advantage of sequential search algorithm over other subset variable selection methods) and choosing most accurate subset of variables (the advantages of genetic algorithm-based subset variable selection methods such as GFA). The model shows a promising average absolute relative deviation of 0.9 % from the corresponding literature values. Subsequently, a group contribution model was developed based on the same database. The model shows an average absolute relative deviation of 0.83% from corresponding literature values. Freezing Point temperature of organic compounds was the last property investigated. Initially, the largest ever reported databank in open literature for freezing points of more than 16 500 pure organic compounds was provided. Then, the sequential search algorithm was successfully applied to derive a model. The model shows an average absolute relative deviations of 12.6% from the corresponding literature values. The same database was used to develop a group contribution model. The model demonstrated an average absolute relative deviation of 10.76%, which is of adequate accuracy for many practical applications

    Estudo da correlação quantitativa entre estrutura e propriedade (QSPR) usando descritores topológicos para compostos carbonílicos alifáticos

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Ciências Físicas e Matemáticas. Curso de Pós-Graduação em QuímicaNeste trabalho foi aplicada a relação quantitativa entre estrutura e atividade, empregando-se diferentes descritores moleculares para estimar o odor frutal de ésteres alifáticos. Os parâmetros estatísticos, obtidos nas equações para os ésteres, empregando-se o método de regressão linear múltipla, foram de boa qualidade. O modelo obtido teve uma alta capacidade de predição, como estabelecido pelo coeficiente de validação cruzada. O método semi-empírico topológico (IET) foi ampliado para estimar a retenção cromatográfica, em fases estacionárias de baixa polaridade, de ésteres, aldeídos e cetonas lineares e ramificados. Os parâmetros estatísticos das regressões lineares simples entre os índices de retenção de Kováts e o IET foram excelentes para todos os compostos. Os modelos de correlação quantitativa entre estrutura e retenção cromatográfica obtidos com um único descritor tiveram alta capacidade de predição, além de apresentarem uma melhora na ordem de precisão e exatidão que os métodos de regressão linear múltipla. Este IET foi aplicado para estimar o ponto de ebulição de aldeídos e cetonas e os valores de "threshold" de odor de cetonas com odor canforáceo e frutal. Os pontos de ebulição de 35 aldeídos e cetonas foram precisamente estimados através de uma regressão linear simples e os valores dos "thresholds" de odor de 27 cetonas foram estimados através de uma função polinomial quadrática. Assim, o método semi-empírico topológico, baseado no comportamento geral da retenção cromatográfica de ésteres, aldeídos e cetonas utilizando um único descritor, representa um grande avanço nos estudos de correlação quantitativa entre estrutura e propriedade (QSPR)

    Índice semi-empírico topológico: desenvolvimento e aplicação de um novo descritor molecular em estudos de correlação quantitativa estrutura-propriedade (QSPR)

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Ciências Físicas e Matemáticas. Programa de Pós-Graduação em Química.Neste estudo um novo descritor molecular - Índice Semi-Empírico Topológico (IET) - foi desenvolvido, a fim de estabelecer correlações quantitativas entre estrutura e propriedade (QSPR), para diferentes classes de compostos. Este Índice foi desenvolvido e otimizado para prever a retenção cromatográfica de alcenos ramificados, alcanos metil ramificados produzidos por insetos e álcoois saturados, em fases estacionárias de baixa polaridade. Foi avaliada, também, a habilidade de previsão do IET para a retenção cromatográfica de álcoois, aldeídos e cetonas em fases estacionárias mais polares. Os estudos preliminares aplicando o IET a diferentes propriedades/atividades apresentaram resultados promissores para a aplicação futura deste novo método. Para alcenos e álcoois foram obtidas correlações entre o IET e as propriedades (ponto de ebulição normal, refração molar, volume molar, calor de combustão, calor de vaporização molar e coeficiente de partição octanol/água), com valores de r > 0,94. As correlações quantitativas estrutura-atividade (QSAR) foram testadas para álcoois saturados, onde as atividades biológicas investigadas foram: atividade narcótica sobre larvas das cracas, toxicidade em aranhas e tomates e odor (r > 0,88). A qualidade dos resultados obtidos neste trabalho para a previsão de diferentes propriedades/atividades, empregando o IET como descritor molecular, pode ser considerada como uma importante etapa na direção de estudos futuros em QSAR/QSPR/QSRR

    Development of a Predictive Equation of State for Equilibrium and Volumetric Properties of Diverse Molecules and Their Mixtures

    Get PDF
    Accurate prediction of the phase equilibrium and volumetric properties of pure fluids and their mixtures is essential for chemical process design and related applications. Although experiments provide accurate data at specific phase conditions, such data are limited and do not meet the ever-expanding industrial needs for process design and development. Therefore, a need exists for models that can provide accurate predictions of a wide range of thermodynamic properties. Cubic equations of state (CEOS) are widely used for calculations of thermodynamic properties; however, they often require experimental data for system-specific model tuning. An attractive alternative is to develop predictive equations of state that can estimate these properties based solely on the molecular structure - the most basic information that is generally available. In this work, the Peng-Robinson (PR) EOS is the focus of such development.The two main objectives of this study are to (1) develop improved generalized models for critical properties, acentric factor and vapor-liquid equilibria (VLE) property predictions using a theory-framed quantitative structure-property relationship (QSPR) modeling approach and (2) develop a new volume-translation function with a scaling-law correction to predict liquid density for pure fluids and mixtures of diverse molecules.To facilitate model development, a comprehensive databases of experimental measurements was assembled for pure-fluid critical properties, acentric factors, and liquid densities as well as VLE and liquid densities of binary mixtures. QSPR models were then developed to provide a priori predictions for the critical properties, acentric factor and VLE properties. The newly developed QSPR models for the critical properties provided predictions within twice the experimental errors. Similarly for VLE predictions, the QSPR models resulted in approximately twice the errors obtained through the data regression analyses of the VLE systems considered. Also, a new volume-translation method for the PR EOS was developed. The volume-translation function parameter was generalized in terms of molecular properties of each fluid. Then, the volume-translated PR EOS was extended to predict liquid densities of diverse mixtures employing EOS conventional mixing rules. The volume-translation approach developed in this work has been shown capable of providing accurate predictions of liquid densities in the saturated as well as single-phase regions for pure fluids and mixtures over large ranges of pressure and temperature. Specifically, the new volume-translated PR EOS yielded errors that are three to six times lower than the corresponding predictions from the untranslated model.Chemical Engineerin

    Aqueous solubility of drug-like compounds

    Get PDF
    New effective experimental techniques in medicinal chemistry and pharmacology have resulted in a vast increase in the number of pharmacologically interesting compounds. However, the possibility of producing drug candidates with optimal biopharmaceutical and pharmacokinetic properties is still improvable. A large fraction of typical drug candidates is poorly soluble in water, which results in low drug concentrations in gastrointestinal fluids and related acceptable low drug absorption. Therefore, gaining knowledge to improve the solubility of compounds is an indispensable requirement for developing compounds with drug-like properties. The main objective of this thesis was to investigate whether computer-based models derived from calculated molecular descriptors and structural fragments can be used to predict aqueous solubility for drug-like compounds with similar structures. For this purpose, both experimental and computational studies were performed. In the experimental work, a novel crystallization method for weak acids and bases was developed and applied for European patent. The obtained crystalline materials could be used for solubility measurements. A novel recognition method was developed to evaluate the tendency of compounds to form amorphous forms. This method could be used to ensure that only solubilities of crystalline materials were collected for the development of solubility prediction. In the development of improved in silico solubility models, lipophilicity was confirmed as the major driving factor and crystal information related descriptors as the second important factor for solubility. Reasons for the limited precision of commercial solubility prediction tools were identified. A general solubility model of high accuracy was obtained for drug-like compounds in congeneric series when lipophilicity was used as descriptor in combination with the structural fragments. Rules were derived from the prediction models of solubility which could be used by chemists or interested scientists as a rough guideline on the contribution of structural fragments on solubility: Aliphatic and polar fragments with high dipole moments are always considered as solubility enhancing. Strong acids and bases usually have lower intrinsic solubility than neutral ones. In summary, an improved solubility prediction method for congeneric series was developed using high quality solubility results of drugs and drug precursors as input parameter. The derived model tried to overcome difficulties of commercially available prediction tools for solubility by focusing on structurally related series and showed higher predictive power for drug-like compounds in comparison to commercially available tools. Parts of the results of this work were protected by a patent application1, which was filed by F. Hoffmann-La Roche Ltd on August 30, 2005
    corecore