5,655 research outputs found

    Use of Statistical and Neural Net Approaches in Predicting Toxicity of Chemicals

    Get PDF
    Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC 50 ) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data

    Modeling of the Acute Toxicity of Benzene Derivatives by Complementary QSAR Methods

    Get PDF
    A data set containing acute toxicity values (96-h LC50) of 69 substituted benzenes for fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure- Activity Relationship (QSAR) models, either using or not using molecular descriptors, respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the molecular structure, described through an appropriate graphical tool (variable-size labeled rooted ordered trees) by defining suitable representation rules. The input trees are encoded by an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity training examples. Owing to the use of a flexible encoding approach, the model is target invariant and does not need a priori definition of molecular descriptors. The results obtained in this study were analyzed together with those of a model based on molecular descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression selection of descriptors (CROMRsel). The comparison revealed interesting similarities that could lead to the development of a combined approach, exploiting the complementary characteristics of the two approaches

    Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

    Get PDF
    In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

    In silico prediction of acute chemical toxicity of biocides in marine crustaceans using machine learning

    Get PDF
    Biocides are a heterogeneous group of chemical substances intended to control the growth or kill undesired organisms. Due to their extensive use, they enter marine ecosystems via non-point sources and may pose a threat to ecologically important non-target organisms. Consequently, industries and regulatory agencies have recognized the ecotoxicological hazard potential of biocides. However, the prediction of biocide chemical toxicity on marine crustaceans has not been previously evaluated. This study aims to provide in silico models capable of classifying structurally diverse biocidal chemicals into different toxicity categories and predict acute chemical toxicity (LC50) in marine crustaceans using a set of calculated 2D molecular descriptors. The models were built following the guidelines recommended by the OECD (Organization for Economic Cooperation and Development) and validated through stringent processes (internal and external validation). Six machine learning (ML) models were built and compared (linear regression: LR; support vector machine: SVM; random forest: RF; feed-forward backpropagation-based artificial neural network: ANN; decision trees: DT and naĂŻve Bayes: NB) for regression and classification analysis to predict toxicities. All the models displayed encouraging results with high generalisability: the feed-forward-based backpropagation method showed the best results with determination coefficient R2 values of 0.82 and 0.94, respectively, for training set (TS) and validation set (VS). For classification-based modelling, the DT model performed the best with an accuracy (ACC) of 100 % and an area under curve (AUC) value of 1 for both TS and VS. These models showed the potential to replace animal testing for the chemical hazard assessment of untested biocides if they fall within the applicability domain of the proposed models. In general, the models are highly interpretable and robust, with good predictive performance. The models also displayed a trend indicating that toxicity is largely influenced by factors such as lipophilicity, branching, non-polar bonding and saturation of molecules

    Mapping Cumulative Environmental Risks: Examples from The EU NoMiracle Project

    Get PDF
    We present examples of cumulative chemical risk mapping methods developed within the NoMiracle project. The different examples illustrate the application of the concentration addition (CA) approach to pesticides at different scale, the integration in space of cumulative risks to individual organisms under the CA assumption, and two techniques to (1) integrate risks using data-driven, parametric statistical methods, and (2) cluster together areas with similar occurrence of different risk factors, respectively. The examples are used to discuss some general issues, particularly on the conventional nature of cumulative risk maps, and may provide some suggestions for the practice of cumulative risk mapping

    Modeling of the acute toxicity of benzene derivatives by complementary QSAR methods

    Get PDF
    A data set containing acute toxicity values (96-h LC50) of 69 substituted benzenes for fathead minnow (Pimephales promelas) was investigated with two Quantitative Structure- Activity Relationship (QSAR) models, either using or not using molecular descriptors, respectively. Recursive Neural Networks (RNN) derive a QSAR by direct treatment of the molecular structure, described through an appropriate graphical tool (variable-size labeled rooted ordered trees) by defining suitable representation rules. The input trees are encoded by an adaptive process able to learn, by tuning its free parameters, from a given set of structureactivity training examples. Owing to the use of a flexible encoding approach, the model is target invariant and does not need a priori definition of molecular descriptors. The results obtained in this study were analyzed together with those of a model based on molecular descriptors, i.e. a Multiple Linear Regression (MLR) model using CROatian MultiRegression selection of descriptors (CROMRsel). The comparison revealed interesting similarities that could lead to the development of a combined approach, exploiting the complementary characteristics of the two approaches

    CATMoS: Collaborative Acute Toxicity Modeling Suite.

    Get PDF
    BACKGROUND: Humans are exposed to tens of thousands of chemical substances that need to be assessed for their potential toxicity. Acute systemic toxicity testing serves as the basis for regulatory hazard classification, labeling, and risk management. However, it is cost- and time-prohibitive to evaluate all new and existing chemicals using traditional rodent acute toxicity tests. In silico models built using existing data facilitate rapid acute toxicity predictions without using animals. OBJECTIVES: The U.S. Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) Acute Toxicity Workgroup organized an international collaboration to develop in silico models for predicting acute oral toxicity based on five different end points: Lethal Dose 50 (LD50 value, U.S. Environmental Protection Agency hazard (four) categories, Globally Harmonized System for Classification and Labeling hazard (five) categories, very toxic chemicals [LD50 (LD50≤50mg/kg)], and nontoxic chemicals (LD50>2,000mg/kg). METHODS: An acute oral toxicity data inventory for 11,992 chemicals was compiled, split into training and evaluation sets, and made available to 35 participating international research groups that submitted a total of 139 predictive models. Predictions that fell within the applicability domains of the submitted models were evaluated using external validation sets. These were then combined into consensus models to leverage strengths of individual approaches. RESULTS: The resulting consensus predictions, which leverage the collective strengths of each individual model, form the Collaborative Acute Toxicity Modeling Suite (CATMoS). CATMoS demonstrated high performance in terms of accuracy and robustness when compared with in vivo results. DISCUSSION: CATMoS is being evaluated by regulatory agencies for its utility and applicability as a potential replacement for in vivo rat acute oral toxicity studies. CATMoS predictions for more than 800,000 chemicals have been made available via the National Toxicology Program's Integrated Chemical Environment tools and data sets (ice.ntp.niehs.nih.gov). The models are also implemented in a free, standalone, open-source tool, OPERA, which allows predictions of new and untested chemicals to be made. https://doi.org/10.1289/EHP8495
    • …
    corecore