410 research outputs found

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂ­a de EconomĂ­a e Industria; 10SIN105004P

    Bayesian neural network learning for repeat purchase modelling in direct marketing.

    Get PDF
    We focus on purchase incidence modelling for a European direct mail company. Response models based on statistical and neural network techniques are contrasted. The evidence framework of MacKay is used as an example implementation of Bayesian neural network learning, a method that is fairly robust with respect to problems typically encountered when implementing neural networks. The automatic relevance determination (ARD) method, an integrated feature of this framework, allows to assess the relative importance of the inputs. The basic response models use operationalisations of the traditionally discussed Recency, Frequency and Monetary (RFM) predictor categories. In a second experiment, the RFM response framework is enriched by the inclusion of other (non-RFM) customer profiling predictors. We contribute to the literature by providing experimental evidence that: (1) Bayesian neural networks offer a viable alternative for purchase incidence modelling; (2) a combined use of all three RFM predictor categories is advocated by the ARD method; (3) the inclusion of non-RFM variables allows to significantly augment the predictive power of the constructed RFM classifiers; (4) this rise is mainly attributed to the inclusion of customer\slash company interaction variables and a variable measuring whether a customer uses the credit facilities of the direct mailing company.Marketing; Companies; Models; Model; Problems; Neural networks; Networks; Variables; Credit;

    Intelligent Modelling of the Environmental Behaviour of Chemicals

    Get PDF
    In view of the new European Union chemical policy REACH (Registration, Evaluation, and Authorization of Chemicals), interest in "non-animal" methods for assessing the risk potentials of chemicals towards human health and environment has increased. The incapability of classical modelling approaches in the complex and ill-defined modelling problems of chemicals' environmental behavior, together with an availability of large computing power in modern times raise an interest in applying computational models inspired by the approaches coming from the area of artificial intelligence. This thesis is devoted to promote the applications of neuro/fuzzy techniques in assessing the environmental behavior of chemicals. Some of the bottlenecks lying in the neuro/fuzzy modelling of chemicals' behavior towards environment have been identified and the solutions have been provided based on the techniques of computational intelligence.Diese Dissertation beinhaltet die Anwendung von neuronalen bzw. fuzzy Netzen, um das Umweltverhalten von Chemikalien beurteilen zu können. In dieser Arbeit werden die Probleme der Modellierung von Chemikalien gegenüber der Umwelt aufgezeigt und Lösungen angeboten. Die Lösungen basieren auf künstlichen Intelligenztechniken. Die Qualität der Modellierungstechniken hängt von mehreren Faktoren ab, z.B. der Eingabe, der Struktur und so weiter. In vielen Fällen werden keine geeigneten Resultate erhalten. So läuft es auf die Entwicklung eines Modells mit einer niedrigen Generalisierungsfähigkeit (Verallgemeinerungsfähigkeit)hinaus

    The application of stochastic machine learning methods in the prediction of skin penetration

    Get PDF
    Original article can be found at: http://www.sciencedirect.com Copyright ElsevierImproving predictions of skin permeability is a significant problem for which mathematical solutions have been sought for around twenty years. However, the current approaches are limited by the nature of the models chosen and the nature of the dataset. This is an important problem, particularly with the increased use of transdermal and topical drug delivery systems. In this work, we apply K-nearest-neighbour regression, single layer networks, mixture of experts and Gaussian processes to predict the skin permeability coefficient of penetrants. A considerable improvement, both statistically and in terms of the accuracy of predictions, over the current quantitative structure-permeability relationships (QSPRs) was found. Gaussian processes provided the most accurate predictions, when compared to experimentally generated results. It was also shown that using five molecular descriptors - molecular weight, solubility parameter, lipophilicity, the number of hydrogen bonding acceptor and donor groups - can produce better predictions than when using only lipophilicity and the molecular weight, which is an approach commonly found with QSPRs. The Gaussian process regression with five compound features was shown to give the best performance in this work. Therefore, Gaussian processes would appear to provide a viable alternative to the development of predictive models for skin absorption and underpin more realistically mechanistic understandings of the physical process of the percutaneous absorption of exogenous chemicals. (C) 2010 Elsevier B.V. All rights reserved.Peer reviewe

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments
    • …
    corecore