709 research outputs found
Evolutionary Computation and QSAR Research
[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂa de EconomĂa e Industria; 10SIN105004P
Designing algorithms to aid discovery by chemical robots
Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery
Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative StructureâActivity Relationship Models of Animal Carcinogenicity
BackgroundTo develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem.ObjectivesWe have explored these data in terms of their utility for predicting adverse health effects of the environmental agents.Methods and resultsInitially, the classification k nearest neighbor (kNN) quantitative structureâactivity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTPâHTS studies. We found that compounds classified by HTS as âactivesâ in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS âinactivesâ were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors.ConclusionsOur studies suggest that combining NTPâHTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology
Chemometric Analysis of Ligand Receptor Complementarity:Â Identifying Complementary Ligands Based on Receptor Information (CoLiBRI)
We have developed a novel structure-based approach to search for Complimentary Ligands Based on Receptor Information (CoLiBRI). CoLiBRI is based on the representation of both receptor binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands are identified by the means of computational geometry technique known as Delaunay tessellation as applied to x-ray characterized ligand-receptor complexes. TAE/RECON1 multiple chemical descriptors are calculated independently for each ligand as well as for its active site atoms. The representation of both ligands and active sites using chemical descriptors allows the application of well-known chemometric techniques in order to correlate chemical similarities between active sites and their respective ligands. From these calculations, we have established a protocol to map patterns of nearest neighbor active site vectors in a multidimensional TAE/RECON space onto those of their complementary ligands, and vice versa. This protocol affords the prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector. This prediction is followed by chemical similarity calculations between this virtual ligand vector and those calculated for molecules in a chemical database to identify real compounds most similar to the virtual ligand. Consequently, the knowledge of the receptor active site structure affords straightforward and efficient identification of its complementary ligands in large databases of chemical compounds using rapid chemical similarity searches. Conversely, starting from the ligand chemical structure, one may identify possible complementary receptor cavities as well. We have applied the CoLiBRI approach to a dataset of 800 x-ray characterized ligand receptor complexes in the PDBbind database2. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection, we have shown that knowledge of the active site structure affords identification of its complimentary ligand among the top 1% of a large chemical database in over 90% of all test active sites when a binding site of the same protein family was present in the training set. In the case where test receptors are highly dissimilar and not present among the receptor families in the training set, the prediction accuracy is decreased; however CoLiBRI was still able to quickly eliminate 75% of the chemical database as improbable ligands. The CoLiBRI approach provides an efficient prescreening tool for large chemical databases prior to traditional, yet much more computationally intensive, three-dimensional docking approaches
Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles
The interpretation of metabolic information is crucial to understanding the functioning of a biological
system. Latent information about the metabolic state of a sample can be acquired using
analytical chemistry methods, which generate spectroscopic profiles. Thus, nuclear magnetic resonance
spectroscopy and mass spectrometry techniques can be employed to generate vast amounts
of highly complex data on the metabolic content of biofluids and tissue, and this thesis discusses
ways to process, analyse and interpret these data successfully.
The evaluation of J -resolved spectroscopy in magnetic resonance profiling and the statistical
techniques required to extract maximum information from the projections of these spectra are
studied. In particular, data processing is evaluated, and correlation and regression methods are
investigated with respect to enhanced model interpretation and biomarker identification. Additionally,
it is shown that non-linearities in metabonomic data can be effectively modelled with
kernel-based orthogonal partial least squares, for which an automated optimisation of the kernel
parameter with nested cross-validation is implemented. The interpretation of orthogonal variation
and predictive ability enabled by this approach are demonstrated in regression and classification
models for applications in toxicology and parasitology. Finally, the vast amount of data generated
with mass spectrometry imaging is investigated in terms of data processing, and the benefits of
applying multivariate techniques to these data are illustrated, especially in terms of interpretation
and visualisation using colour-coding of images. The advantages of methods such as principal
component analysis, self-organising maps and manifold learning over univariate analysis are highlighted.
This body of work therefore demonstrates new means of increasing the amount of biochemical
information that can be obtained from a given set of samples in biological applications using
spectral profiling. Various analytical and statistical methods are investigated and illustrated with
applications drawn from diverse biomedical areas
Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications
The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin
Predicting Complexation Thermodynamic Parameters of β-Cyclodextrin with Chiral Guests by Using Swarm Intelligence and Support Vector Machines
The Particle Swarm Optimization (PSO) and Support Vector Machines (SVMs) approaches are used for predicting the thermodynamic parameters for the 1:1 inclusion complexation of chiral guests with β-cyclodextrin. A PSO is adopted for descriptor selection in the quantitative structure-property relationships (QSPR) of a dataset of 74 chiral guests due to its simplicity, speed, and consistency. The modified PSO is then combined with SVMs for its good approximating properties, to generate a QSPR model with the selected features. Linear, polynomial, and Gaussian radial basis functions are used as kernels in SVMs. All models have demonstrated an impressive performance with R2 higher than 0.8
ARTIFICIAL NEURAL NETWORKS: FUNCTIONINGANDAPPLICATIONS IN PHARMACEUTICAL INDUSTRY
Artificial Neural Network (ANN) technology is a group of computer designed algorithms for simulating neurological processing to process information and produce outcomes like the thinking process of humans in learning, decision making and solving problems. The uniqueness of ANN is its ability to deliver desirable results even with the help of incomplete or historical data results without a need for structured experimental design by modeling and pattern recognition. It imbibes data through repetition with suitable learning models, similarly to humans, without actual programming. It leverages its ability by processing elements connected with the user given inputs which transfers as a function and provides as output. Moreover, the present output by ANN is a combinational effect of data collected from previous inputs and the current responsiveness of the system. Technically, ANN is associated with highly monitored network along with a back propagation learning standard. Due to its exceptional predictability, the current uses of ANN can be applied to many more disciplines in the area of science which requires multivariate data analysis. In the pharmaceutical process, this flexible tool is used to simulate various non-linear relationships. It also finds its application in the enhancement of pre-formulation parameters for predicting physicochemical properties of drug substances. It also finds its applications in pharmaceutical research, medicinal chemistry, QSAR study, pharmaceutical instrumental engineering. Its multi-objective concurrent optimization is adopted in the drug discovery process, protein structure, rational data analysis also
Recommended from our members
Evaluation of pesticide toxicity: a hierarchical QSAR approach to model the acute aquatic toxicity and avian oral toxicity of pesticides
The thesis aimed to extract information relevant to the hazard and risk assessment of pesticides. In particular, quantitative structure-activity relationship (QSAR) approaches have been used to build up a mathematical model able to predict the aquatic acute toxicity, LC50, and the avian oral toxicity, LD50, for pesticides. Ecotoxicological values were collected from several databases, and screened according to quality criteria.
A hierarchical QSAR approach was applied for the prediction of acute aquatic toxicity. Chemical structures were encoded into molecular descriptors by an automated, seamless procedure available within the OpenMolGRID system. Different linear and non-linear regression techniques were used to obtain reliable and thoroughly validated QSARs. The final model was developed by a counter-propagation neural network coupled with genetic algorithms for variable selection. The proposed QSAR is consistent with McFarland's principle for biological activity and makes use of seven molecular descriptors. The model was assessed thoroughly in test (R2 = 0.8) and validation sets (R2 = 0.72), the y-scrambling test and a sensitivity/stability test.
The second endpoint considered in this thesis was avian oral toxicity. As previously, the chemical description of chemicals was generated automatically by the OpenMolGRID system. The best classification model was chosen on the basis of the performances on a validation set of 19 data points, and was obtained from a support vector machine using 94 data points and nine variables selected by genetic algorithms (Error Ratetraining = 0.021, Error Ratevalidation = 0.158). The model allowed for a mechanistic estimation of the toxicological action. In fact, several descriptors selected for the final classification model encode for the interaction of the pesticides with other molecules. The presence of hetero-atoms, e.g. sulphur atoms, is correlated with the toxicity, and the pool of descriptor selected is generally dependent from the 3D conformation of the structures. These suggest that, in the case of avian oral toxicity, pesticides probably exert their toxic action through the interaction with some macromolecule and/or protein of the biological system
- âŚ