Search CORE

172 research outputs found

Evolutionary Computation and QSAR Research

Author: Aguiar-Pulido Vanessa
Cruz-Monteagudo Maykel
Dorado Julián
Gestal M.
Munteanu Cristian-Robert
Rabuñal Juan R.
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2013
Field of study

[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

University of Miami: Scholarship Miami

Prediction of the physical properties of pure chemical compounds through different computational methods.

Author: Gharaghezi Farhad.
Publication venue
Publication date: 01/01/2014
Field of study

Ph. D. University of KwaZulu-Natal, Durban 2014.Liquid thermal conductivities, viscosities, thermal decomposition temperatures, electrical conductivities, normal boiling point temperatures, sublimation and vaporization enthalpies, saturated liquid speeds of sound, standard molar chemical exergies, refractive indices, and freezing point temperatures of pure organic compounds and ionic liquids are important thermophysical properties needed for the design and optimization of products and chemical processes. Since sufficiently purification of pure compounds as well as experimentally measuring their thermophysical properties are costly and time consuming, predictive models are of great importance in engineering. The liquid thermal conductivity of pure organic compounds was the first investigated property, in this study, for which, a general model, a quantitative structure property relationship, and a group contribution method were developed. The novel gene expression programming mathematical strategy [1, 2], firstly introduced by our group, for development of non-linear models for thermophysical properties, was successfully implemented to develop an explicit model for determination of the thermal conductivity of approximately 1600 liquids at different temperatures but atmospheric pressure. The statistical parameters of the obtained correlation show about 9% absolute average relative deviation of the results from the corresponding DIPPR 801 data [3]. It should be mentioned that the gene expression programing technique is a complicated mathematical algorithm and needs a significant computer power and this is the largest databases of thermophysical property that has been successfully managed by this strategy. The quantitative structure property relationship was developed using the sequential search algorithm and the same database used in previous step. The model shows the average absolute relative deviation (AARD %), standard deviation error, and root mean square error of 7.4%, 0.01, and 0.01 over the training, validation and test sets, respectively. The database used in previous sections was used to develop a group contribution model for liquid thermal conductivity. The statistical analysis of the performance of the obtained model shows approximately a 7.1% absolute average relative deviation of the results from the corresponding DIPPR 801 [4] data. In the next stage, an extensive database of viscosities of 443 ionic liquids was initially compiled from literature (more than 200 articles). Then, it was employed to develop a group contribution model. Using this model, a training set composed of 1336 experimental data was correlated with a low AARD% of about 6.3. A test set consists of 336 data point was used to validate this model. It shows an AARD% of 6.8 for the test set. In the next part of this study, an extensive database of thermal decomposition temperature of 586 ionic liquids was compiled from literature. Then, it was used to develop a quantitative structure property relationship. The proposed quantitative structure property relationship produces an acceptable average absolute relative deviation (AARD) of less than 5.2 % taking into consideration all 586 experimental data values. The updated database of thermal decomposition temperature including 613 ionic liquids was subsequently used to develop a group contribution model. Using this model, a training set comprised of 489 data points was correlated with a low AARD of 4.5 %. A test set consisting of 124 data points was employed to test its capability. The model shows an AARD of 4.3 % for the test set. Electrical conductivity of ionic liquids was the next property investigated in this study. Initially, a database of electrical conductivities of 54 ionic liquids was collected from literature. Then, it was used to develop two models; a quantitative structure property relationship and a group contribution model. Since the electrical conductivities of ionic liquids has a complicated temperature- and chemical structure- dependency, the least square support vector machines strategy was used as a non-linear regression tool to correlate the electrical conductivity of ionic liquids. The deviation of the quantitative structure property relationship from the 783 experimental data used in its development (training set) is 1.8%. The validity of the model was then evaluated using another experimental data set comprising 97 experimental data (deviation: 2.5%). Finally, the reproducibility and reliability of the model was successfully assessed using the last experimental dataset of 97 experimental data (deviation: 2.7%). Using the group contribution model, a training set composed of 863 experimental data was correlated with a low AARD of about 3.1% from the corresponding experimental data. Then, the model was validated using a data set composed of 107 experimental data points with a low AARD of 3.6%. Finally, a test set consists of 107 data points was used for its validation. It shows an AARD of 4.9% for the test set. In the next stage, the most comprehensive database of normal boiling point temperatures of approximately 18000 pure organic compounds was provided and used to develop a quantitative structure property relationship. In order to develop the model, the sequential search algorithm was initially used to select the best subset of molecular descriptors. In the next step, a three-layer feed forward artificial neural network was used as a regression tool to develop the final model. It seems that this is the first time that the quantitative structure property relationship technique has successfully been used to handle a large database as large as the one used for normal boiling point temperatures of pure organic compounds. Generally, handling large databases of compounds has always been a challenge in quantitative structure property relationship world due to the handling large number of chemical structures (particularly, the optimization of the chemical structures), the high demand of computational power and very high percentage of failures of the software packages. As a result, this study is regarded as a long step forward in quantitative structure property relationship world. A comprehensive database of sublimation enthalpies of 1269 pure organic compounds at 298.15 K was successfully compiled from literature and used to develop an accurate group contribution. The model is capable of predicting the sublimation enthalpies of organic compounds at 298.15 K with an acceptable average absolute relative deviation between predicted and experimental values of 6.4%. Vaporization enthalpies of organic compounds at 298.15 K were also studied in this study. An extensive database of 2530 pure organic compounds was used to develop a comprehensive group contribution model. It demonstrates an acceptable %AARD of 3.7% from experimental data. Speeds of sound in saturated liquid phase was the next property investigated in this study. Initially, A collection of 1667 experimental data for 74 pure chemical compounds were extracted from the ThermoData Engine of National Institute of Standards and Technology [5]. Then, a least square support vector machines-group contribution model was developed. The model shows a low AARD% of 0.5% from the corresponding experimental data. In the next part of this study, a simple group contribution model was presented for the prediction of the standard molar chemical exergy of pure organic compounds. It is capable of predicting the standard chemical exergy of pure organic compounds with an acceptable average absolute relative deviation of 1.6% from the literature data of 133 organic compounds. The largest ever reported databank for refractive indices of approximately 12 000 pure organic compounds was initially provided. A novel computational scheme based on coupling the sequential search strategy with the genetic function approximation (GFA) strategy was used to develop a model for refractive indices of pure organic compounds. It was determined that the strategy can have both the capabilities of handling large databases (the advantage of sequential search algorithm over other subset variable selection methods) and choosing most accurate subset of variables (the advantages of genetic algorithm-based subset variable selection methods such as GFA). The model shows a promising average absolute relative deviation of 0.9 % from the corresponding literature values. Subsequently, a group contribution model was developed based on the same database. The model shows an average absolute relative deviation of 0.83% from corresponding literature values. Freezing Point temperature of organic compounds was the last property investigated. Initially, the largest ever reported databank in open literature for freezing points of more than 16 500 pure organic compounds was provided. Then, the sequential search algorithm was successfully applied to derive a model. The model shows an average absolute relative deviations of 12.6% from the corresponding literature values. The same database was used to develop a group contribution model. The model demonstrated an average absolute relative deviation of 10.76%, which is of adequate accuracy for many practical applications

ResearchSpace@UKZN

(Q)SAR Modelling of Nanomaterial Toxicity - A Critical Review

Author: Akbari
Albanese
Albazzaz
Andres
Apté
Arena
Arora
Asare
Baer
Baskin
Baskin
Bengio
Benigni
Bhattacharjee
Bootz
Borchert
Boverhof
Boyd
Brown
Buontempo
Burden
Burello
Burello
Buzea
Caballero Díaz
Cai Y. Ma
Ceyda Oksel
Chau
Cohen
Cramer
Czermiński
Darnag
Dhawan
Domingos
Dunn
Díaz-Uriarte
Edelstein
Epa
Eriksson
Eriksson
Falkner
Foldbjerg
Fourches
Fourches
Gajewicz
Gallegos
Genuer
Glawdel
Glotzer
Goodarzi
Gramatica
Grassian
Gratton
Greene
Gu
Guha
Gurr
Guyon
Gwaze
Habibi-Yangjeh
Han
Han
Hansch
Harris
Hasegawa
Hassellöv
Holgate
Hoo
Horie
Hosokawa
Hussain
Inselberg
Jalali-Heravi
Jalali-Heravi
Jaworska
Jeng
Jiang
Jiang
Jing J. Liu
Kar
Karlsson
Kim
Kubinyi
Kübart
Laidlaw
Le
Lee
Li
Li
Lin
Liu
Liu
Liu
Liu
Lubinski
Luco
Luco
Ma
Ma
Ma
Magrez
Mei
Mohammadpour
Monteiro-Riviere
Mu
Napierska
Nel
Nguyen
Niu
Niu
Oberdorster
OECD
OECD
OECD
Overton
Park
Park
Paul
Pettitt
Poland
Powers
Powers
Powers
Puzyn
Puzyn
Puzyn
Puzyn
Qin
Rallo
Reddy
Richet
Rosipal
Sadik
Sahigara
Saquib
Savolainen
Sayes
Schaeublin
Setyawati
Shahlaei
Sharifi
Sharma
Sharma
Shaw
Shukla
Shvedova
Silva
Singh
Sivaraman
Supaka
Sussillo
Sussman
Tantra
Teixeira
Terry Wilkins
Thiele
Thomas
Tomaszewska
Toropov
Toropov
Toropov
Tougaard
Tropsha
Trouiller
Veerasamy
Ventura
Von der Kammer
Wang
Wang
Wang
Wani
Winkler
Xia
Xia
Xiu
Xu
Xue Z. Wang
Yang
Yang
Yao
Yao
Yee
Zhang
Zhao
Zhou
Zurada
Publication venue: 'Elsevier BV'
Publication date: 01/08/2015
Field of study

There is an increasing recognition that nanomaterials pose a risk to human health, and that the novel engineered nanomaterials (ENMs) in the nanotechnology industry and their increasing industrial usage poses the most immediate problem for hazard assessment, as many of them remain untested. The large number of materials and their variants (different sizes and coatings for instance) that require testing and ethical pressure towards non-animal testing means that expensive animal bioassay is precluded, and the use of (quantitative) structure activity relationships ((Q)SAR) models as an alternative source of hazard information should be explored. (Q)SAR modelling can be applied to fill the critical knowledge gaps by making the best use of existing data, prioritize physicochemical parameters driving toxicity, and provide practical solutions to the risk assessment problems caused by the diversity of ENMs. This paper covers the core components required for successful application of (Q)SAR technologies to ENMs toxicity prediction, and summarizes the published nano-(Q)SAR studies and outlines the challenges ahead for nano-(Q)SAR modelling. It provides a critical review of (1) the present status of the availability of ENMs characterization/toxicity data, (2) the characterization of nanostructures that meets the need of (Q)SAR analysis, (3) the summary of published nano-(Q)SAR studies and their limitations, (4) the in silico tools for (Q)SAR screening of nanotoxicity and (5) the prospective directions for the development of nano-(Q)SAR models

Crossref

White Rose Research Online

Insights into ensemble learning-based data-driven model for safety-related property of chemical substances

Author: Li Jie
Ma Yingjie
Ren JIngzheng
Shen Weifeng
Su Yang
Wang Zihao
Wen Huaqing
Publication venue: 'Elsevier BV'
Publication date: 02/02/2022
Field of study

The University of Manchester - Institutional Repository

Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications

Author: Yerramsetty Krishna Mohan
Publication venue: 'Oklahoma State University Library'
Publication date: 01/05/2012
Field of study

The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin

SHAREOK repository

Evaluation of the availability and applicability of computational approaches in the safety assessment of nanomaterials: Final report of the Nanocomput project

Author: ASCHBERGER KARIN
ASTURIOL BOFILL DAVID
BESSEMS JOSEPH
GERLOFF KIRSTEN BRITTA
GRAEPEL RABEA
JOOSSENS ELISABETH
LAMON LARA
PALOSAARI TAINA
RICHARZ ANDREA
WORTH ANDREW
Publication venue: 'Publications Office of the European Union'
Publication date: 30/03/2017
Field of study

This is the final report of the Nanocomput project, the main aims of which were to review the current status of computational methods that are potentially useful for predicting the properties of engineered nanomaterials, and to assess their applicability in order to provide advice on the use of these approaches for the purposes of the REACH regulation. Since computational methods cover a broad range of models and tools, emphasis was placed on Quantitative Structure-Property Relationship (QSPR) and Quantitative Structure-Activity Relationship (QSAR) models, and their potential role in predicting NM properties. In addition, the status of a diverse array of compartment-based mathematical models was assessed. These models comprised toxicokinetic (TK), toxicodynamic (TD), in vitro and in vivo dosimetry, and environmental fate models. Finally, based on systematic reviews of the scientific literature, as well as the outputs of the EU-funded research projects, recommendations for further research and development were also made. The Nanocomput project was carried out by the European Commission’s Joint Research Centre (JRC) for the Directorate-General (DG) for Internal Market, Industry, Entrepreneurship and SMEs (DG GROW) under the terms of an Administrative Arrangement between JRC and DG GROW. The project lasted 39 months, from January 2014 to March 2017, and was supported by a steering group with representatives from DG GROW, DG Environment and the European Chemicals Agency (ECHA).JRC.F.3-Chemicals Safety and Alternative Method

JRC Publications Repository

SMARTS Approach to Chemical Data Mining and Physicochemical Property Prediction.

Author: Lee Adam C.
Publication venue
Publication date
Field of study

The calculation of physicochemical and biological properties is essential in order to facilitate modern drug discovery. Chemical spaces dimensionalized by these descriptors have been used to scaffold-hop in order to discover new lead and drug-like molecules. Broadening the boundaries of structure based drug design, these molecules are expected to share the same physiological target and have similar efficacy, as do known drug molecules sharing the same region in chemical property space. In the past few decades physicochemical and ADMET (absorption, distribution, metabolism, elimination, and toxicity) property predictors have been the subject of increased focus in academia and the pharmaceutical industry. Due to the ever increasing attention given to data mining and property predictions, we first discuss the sources of experimental pKa values and current methodologies used for pKa prediction in proteins and small molecules. Of particular concern is an analysis of the scope, statistical validity, overall accuracy, and predictive power of these methods. The expressed concerns are not limited to predicting pKa, but apply to all empirical predictive methodologies. In a bottom-up approach, we explored the influence of freely generated SMARTS string representations of molecular fragments on chelation and cytotoxicity. Later investigations, involving the derivation of predictive models, use stepwise regression to determine the optimal pool of SMARTS strings having the greatest influence over the property of interest. By applying a unique scoring system to sets of highly generalized SMARTS strings, we have constructed well balanced regression trees with predictive accuracy exceeding that of many published and commercially available models for cytotoxicity, pKa, and aqueous solubility. The methodology is robust, extremely adaptable, and can handle any molecular dataset with experimental data. This story details our struggles of data gathering, curation, and the development of a machine learning methodology able to derive and validate highly accurate regression trees capable of extremely fast property predictions. Regression trees created by our method are well suited to calculate descriptors for large in silico molecular libraries, facilitating data mining of chemical spaces in search of new lead molecules in drug discovery.Ph.D.Medicinal ChemistryUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64627/1/adamclee_1.pd

Deep Blue Documents at the University of Michigan

Computational approaches to virtual screening in human central nervous system therapeutic targets

Author: Kausar Samina
Publication venue
Publication date: 01/05/2019
Field of study

In the past several years of drug design, advanced high-throughput synthetic and analytical chemical technologies are continuously producing a large number of compounds. These large collections of chemical structures have resulted in many public and commercial molecular databases. Thus, the availability of larger data sets provided the opportunity for developing new knowledge mining or virtual screening (VS) methods. Therefore, this research work is motivated by the fact that one of the main interests in the modern drug discovery process is the development of new methods to predict compounds with large therapeutic profiles (multi-targeting activity), which is essential for the discovery of novel drug candidates against complex multifactorial diseases like central nervous system (CNS) disorders. This work aims to advance VS approaches by providing a deeper understanding of the relationship between chemical structure and pharmacological properties and design new fast and robust tools for drug designing against different targets/pathways. To accomplish the defined goals, the first challenge is dealing with big data set of diverse molecular structures to derive a correlation between structures and activity. However, an extendable and a customizable fully automated in-silico Quantitative-Structure Activity Relationship (QSAR) modeling framework was developed in the first phase of this work. QSAR models are computationally fast and powerful tool to screen huge databases of compounds to determine the biological properties of chemical molecules based on their chemical structure. The generated framework reliably implemented a full QSAR modeling pipeline from data preparation to model building and validation. The main distinctive features of the designed framework include a)efficient data curation b) prior estimation of data modelability and, c)an-optimized variable selection methodology that was able to identify the most biologically relevant features responsible for compound activity. Since the underlying principle in QSAR modeling is the assumption that the structures of molecules are mainly responsible for their pharmacological activity, the accuracy of different structural representation approaches to decode molecular structural information largely influence model predictability. However, to find the best approach in QSAR modeling, a comparative analysis of two main categories of molecular representations that included descriptor-based (vector space) and distance-based (metric space) methods was carried out. Results obtained from five QSAR data sets showed that distance-based method was superior to capture the more relevant structural elements for the accurate characterization of molecular properties in highly diverse data sets (remote chemical space regions). This finding further assisted to the development of a novel tool for molecular space visualization to increase the understanding of structure-activity relationships (SAR) in drug discovery projects by exploring the diversity of large heterogeneous chemical data. In the proposed visual approach, four nonlinear DR methods were tested to represent molecules lower dimensionality (2D projected space) on which a non-parametric 2D kernel density estimation (KDE) was applied to map the most likely activity regions (activity surfaces). The analysis of the produced probabilistic surface of molecular activities (PSMAs) from the four datasets showed that these maps have both descriptive and predictive power, thus can be used as a spatial classification model, a tool to perform VS using only structural similarity of molecules. The above QSAR modeling approach was complemented with molecular docking, an approach that predicts the best mode of drug-target interaction. Both approaches were integrated to develop a rational and re-usable polypharmacology-based VS pipeline with improved hits identification rate. For the validation of the developed pipeline, a dual-targeting drug designing model against Parkinson’s disease (PD) was derived to identify novel inhibitors for improving the motor functions of PD patients by enhancing the bioavailability of dopamine and avoiding neurotoxicity. The proposed approach can easily be extended to more complex multi-targeting disease models containing several targets and anti/offtargets to achieve increased efficacy and reduced toxicity in multifactorial diseases like CNS disorders and cancer. This thesis addresses several issues of cheminformatics methods (e.g., molecular structures representation, machine learning, and molecular similarity analysis) to improve and design new computational approaches used in chemical data mining. Moreover, an integrative drug-designing pipeline is designed to improve polypharmacology-based VS approach. This presented methodology can identify the most promising multi-targeting candidates for experimental validation of drug-targets network at the systems biology level in the drug discovery process

Universidade de Lisboa: Repositório.UL

Machine learning applications to essential oils and natural extracts

Author: PATSILINAKOS ALEXANDROS
Publication venue
Publication date: 17/12/2020
Field of study

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that allow computers to learn without being explicitly programmed. Various are the applications of ML in pharmaceutical sciences, especially for the prediction of chemical bioactivity and physical properties, becoming an integral component of the drug discovery process. ML is characterized by three learning paradigms that differ in the type of task or problem that an algorithm is intended to solve: supervised, unsupervised, and reinforcement learning. In chapter 2, supervised learning methods were applied to extracts of Lycium barbarum L. fruits for the development of a QSPR model to predict zeaxanthin and carotenoids content based on routinely colorimetric analyses performed on homogenized samples, developing a useful tool that could be used in the food industry. In chapters 3 and 4, ML was applied to the chemical composition of essential oils and correlated to the experimentally determined associated biofilm modulation influence that was either positive or negative. In these two studies, it was demonstrated that biofilm growth is influenced by the presence of essential oils extracted from different plants harvested in different seasons. ML classification techniques were used to develop a Quantitative Activity-Composition Relationship (QCAR) to discover the chemical components mainly responsible for the anti-biofilm activity. The derived models demonstrated that machine learning is a valuable tool to investigate complex chemical mixtures, enabling scientists to understand each component's contribution to the activity. Therefore, these classification models can describe and predict the activity of chemical mixtures and guide the composition of artificial essential oils with desired biological activity. In chapter 5, unsupervised learning models were developed and applied to clinical strains of bacteria that cause cystic fibrosis. The most severe infections reoccurring in cystic fibrosis are due to S. aureus and P. aeruginosa. Intensive use of antimicrobial drugs to fight lung infections leads to the development of antibiotic-resistant bacterial strains. New antimicrobial compounds should be identified to overcome antibiotic resistance in patients. Sixty-one essential oils were studied against a panel of 40 clinical strains of S. aureus and P. aeruginosa isolated from cystic fibrosis patients, and unsupervised machine learning algorithms were applied to pick-up a small number of representative strains (clusters of strains) among the panel of 40. Thus, rapidly identifying three essential oils that strongly inhibit antibiotic-resistant bacterial growth

Archivio della ricerca- Università di Roma La Sapienza

The determination of petroleum reservoir fluid properties : application of robust modeling approaches.

Author: Kamari Arash.
Publication venue
Publication date: 01/01/2016
Field of study

Doctor of Philosophy in Chemical Engineering. University of KwaZulu-Natal, Durban 2016.Abstract available in PDF file

ResearchSpace@UKZN