709 research outputs found

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂ­a de EconomĂ­a e Industria; 10SIN105004P

    Designing algorithms to aid discovery by chemical robots

    Get PDF
    Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery

    Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure–Activity Relationship Models of Animal Carcinogenicity

    Get PDF
    BackgroundTo develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem.ObjectivesWe have explored these data in terms of their utility for predicting adverse health effects of the environmental agents.Methods and resultsInitially, the classification k nearest neighbor (kNN) quantitative structure–activity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTP–HTS studies. We found that compounds classified by HTS as “actives” in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS “inactives” were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors.ConclusionsOur studies suggest that combining NTP–HTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology

    Chemometric Analysis of Ligand Receptor Complementarity:  Identifying Complementary Ligands Based on Receptor Information (CoLiBRI)

    Get PDF
    We have developed a novel structure-based approach to search for Complimentary Ligands Based on Receptor Information (CoLiBRI). CoLiBRI is based on the representation of both receptor binding sites and their respective ligands in a space of universal chemical descriptors. The binding site atoms involved in the interaction with ligands are identified by the means of computational geometry technique known as Delaunay tessellation as applied to x-ray characterized ligand-receptor complexes. TAE/RECON1 multiple chemical descriptors are calculated independently for each ligand as well as for its active site atoms. The representation of both ligands and active sites using chemical descriptors allows the application of well-known chemometric techniques in order to correlate chemical similarities between active sites and their respective ligands. From these calculations, we have established a protocol to map patterns of nearest neighbor active site vectors in a multidimensional TAE/RECON space onto those of their complementary ligands, and vice versa. This protocol affords the prediction of a virtual complementary ligand vector in the ligand chemical space from the position of a known active site vector. This prediction is followed by chemical similarity calculations between this virtual ligand vector and those calculated for molecules in a chemical database to identify real compounds most similar to the virtual ligand. Consequently, the knowledge of the receptor active site structure affords straightforward and efficient identification of its complementary ligands in large databases of chemical compounds using rapid chemical similarity searches. Conversely, starting from the ligand chemical structure, one may identify possible complementary receptor cavities as well. We have applied the CoLiBRI approach to a dataset of 800 x-ray characterized ligand receptor complexes in the PDBbind database2. Using a k nearest neighbor (kNN) pattern recognition approach and variable selection, we have shown that knowledge of the active site structure affords identification of its complimentary ligand among the top 1% of a large chemical database in over 90% of all test active sites when a binding site of the same protein family was present in the training set. In the case where test receptors are highly dissimilar and not present among the receptor families in the training set, the prediction accuracy is decreased; however CoLiBRI was still able to quickly eliminate 75% of the chemical database as improbable ligands. The CoLiBRI approach provides an efficient prescreening tool for large chemical databases prior to traditional, yet much more computationally intensive, three-dimensional docking approaches

    Development and Application of Chemometric Methods for Modelling Metabolic Spectral Profiles

    No full text
    The interpretation of metabolic information is crucial to understanding the functioning of a biological system. Latent information about the metabolic state of a sample can be acquired using analytical chemistry methods, which generate spectroscopic profiles. Thus, nuclear magnetic resonance spectroscopy and mass spectrometry techniques can be employed to generate vast amounts of highly complex data on the metabolic content of biofluids and tissue, and this thesis discusses ways to process, analyse and interpret these data successfully. The evaluation of J -resolved spectroscopy in magnetic resonance profiling and the statistical techniques required to extract maximum information from the projections of these spectra are studied. In particular, data processing is evaluated, and correlation and regression methods are investigated with respect to enhanced model interpretation and biomarker identification. Additionally, it is shown that non-linearities in metabonomic data can be effectively modelled with kernel-based orthogonal partial least squares, for which an automated optimisation of the kernel parameter with nested cross-validation is implemented. The interpretation of orthogonal variation and predictive ability enabled by this approach are demonstrated in regression and classification models for applications in toxicology and parasitology. Finally, the vast amount of data generated with mass spectrometry imaging is investigated in terms of data processing, and the benefits of applying multivariate techniques to these data are illustrated, especially in terms of interpretation and visualisation using colour-coding of images. The advantages of methods such as principal component analysis, self-organising maps and manifold learning over univariate analysis are highlighted. This body of work therefore demonstrates new means of increasing the amount of biochemical information that can be obtained from a given set of samples in biological applications using spectral profiling. Various analytical and statistical methods are investigated and illustrated with applications drawn from diverse biomedical areas

    Quantitative Structure-Property Relationship Modeling & Computer-Aided Molecular Design: Improvements & Applications

    Get PDF
    The objective of this work was to develop an integrated capability to design molecules with desired properties. An automated robust genetic algorithm (GA) module has been developed to facilitate the rapid design of new molecules. The generated molecules were scored for the relevant thermophysical properties using non-linear quantitative structure-property relationship (QSPR) models. The descriptor reduction and model development for the QSPR models were implemented using evolutionary algorithms (EA) and artificial neural networks (ANNs). QSPR models for octanol-water partition coefficients (Kow), melting points (MP), normal boiling points (NBP), Gibbs energy of formation, universal quasi-chemical (UNIQUAC) model parameters, and infinite-dilution activity coefficients of cyclohexane and benzene in various organic solvents were developed in this work. To validate the current design methodology, new chemical penetration enhancers (CPEs) for transdermal insulin delivery and new solvents for extractive distillation of the cyclohexane + benzene system were designed. In general, the use of non-linear QSPR models developed in this work provided predictions better than or as good as existing literature models. In particular, the current models for NBP, Gibbs energy of formation, UNIQUAC model parameters, and infinite-dilution activity coefficients have lower errors on external test sets than the literature models. The current models for MP and Kow are comparable with the best models in the literature. The GA-based design framework implemented in this work successfully identified new CPEs for transdermal delivery of insulin, with permeability values comparable to the best CPEs in the literature. Also, new solvents for extractive distillation of cyclohexane/benzene with selectivities two to four times that of the existing solvents were identified. These two case studies validate the ability of the current design framework to identify new molecules with desired target properties.Chemical Engineerin

    Predicting Complexation Thermodynamic Parameters of β-Cyclodextrin with Chiral Guests by Using Swarm Intelligence and Support Vector Machines

    Get PDF
    The Particle Swarm Optimization (PSO) and Support Vector Machines (SVMs) approaches are used for predicting the thermodynamic parameters for the 1:1 inclusion complexation of chiral guests with β-cyclodextrin. A PSO is adopted for descriptor selection in the quantitative structure-property relationships (QSPR) of a dataset of 74 chiral guests due to its simplicity, speed, and consistency. The modified PSO is then combined with SVMs for its good approximating properties, to generate a QSPR model with the selected features. Linear, polynomial, and Gaussian radial basis functions are used as kernels in SVMs. All models have demonstrated an impressive performance with R2 higher than 0.8

    ARTIFICIAL NEURAL NETWORKS: FUNCTIONINGANDAPPLICATIONS IN PHARMACEUTICAL INDUSTRY

    Get PDF
    Artificial Neural Network (ANN) technology is a group of computer designed algorithms for simulating neurological processing to process information and produce outcomes like the thinking process of humans in learning, decision making and solving problems. The uniqueness of ANN is its ability to deliver desirable results even with the help of incomplete or historical data results without a need for structured experimental design by modeling and pattern recognition. It imbibes data through repetition with suitable learning models, similarly to humans, without actual programming. It leverages its ability by processing elements connected with the user given inputs which transfers as a function and provides as output. Moreover, the present output by ANN is a combinational effect of data collected from previous inputs and the current responsiveness of the system. Technically, ANN is associated with highly monitored network along with a back propagation learning standard. Due to its exceptional predictability, the current uses of ANN can be applied to many more disciplines in the area of science which requires multivariate data analysis. In the pharmaceutical process, this flexible tool is used to simulate various non-linear relationships. It also finds its application in the enhancement of pre-formulation parameters for predicting physicochemical properties of drug substances. It also finds its applications in pharmaceutical research, medicinal chemistry, QSAR study, pharmaceutical instrumental engineering. Its multi-objective concurrent optimization is adopted in the drug discovery process, protein structure, rational data analysis also
    • …
    corecore