17,652 research outputs found

    Prediction Of Carboxylic Acid Toxicity Using Machine Learning Model

    Get PDF
    Carboxylic acids are organic compounds characterized by the presence of a carboxyl functional group capable of donating a proton and forming carboxylate ions in aqueous solutions. The carboxylic acid has widely been used in in manufacturing and medical applications. The rapid growth in carboxylic acid has established a need to predict its toxicity. The purpose of this paper to build predictive toxicity of carboxylic acid models by using five molecular descriptors (refractive index, The octanol/water partition coefficient (log P), acid dissociation constant (pKa), density, and dipole moment) through Machine Learning algorithms. The accuracy of the Machine Learning algorithm was determined by using three different types of models which are Decision Tree, Random Forest and k-Nearest Neighbour (k-NN). Among the machine learning algorithms used, we have determined that the decision tree is the best model for predicting the toxicity of carboxylic acid. This finding demonstrates that the decision tree model exhibits an acceptable level of performance in predicting toxicity within the field of toxicology

    Computational methods for prediction of in vitro effects of new chemical structures

    Get PDF
    Background With a constant increase in the number of new chemicals synthesized every year, it becomes important to employ the most reliable and fast in silico screening methods to predict their safety and activity profiles. In recent years, in silico prediction methods received great attention in an attempt to reduce animal experiments for the evaluation of various toxicological endpoints, complementing the theme of replace, reduce and refine. Various computational approaches have been proposed for the prediction of compound toxicity ranging from quantitative structure activity relationship modeling to molecular similarity-based methods and machine learning. Within the “Toxicology in the 21st Century” screening initiative, a crowd-sourcing platform was established for the development and validation of computational models to predict the interference of chemical compounds with nuclear receptor and stress response pathways based on a training set containing more than 10,000 compounds tested in high-throughput screening assays. Results Here, we present the results of various molecular similarity-based and machine-learning based methods over an independent evaluation set containing 647 compounds as provided by the Tox21 Data Challenge 2014. It was observed that the Random Forest approach based on MACCS molecular fingerprints and a subset of 13 molecular descriptors selected based on statistical and literature analysis performed best in terms of the area under the receiver operating characteristic curve values. Further, we compared the individual and combined performance of different methods. In retrospect, we also discuss the reasons behind the superior performance of an ensemble approach, combining a similarity search method with the Random Forest algorithm, compared to individual methods while explaining the intrinsic limitations of the latter. Conclusions Our results suggest that, although prediction methods were optimized individually for each modelled target, an ensemble of similarity and machine-learning approaches provides promising performance indicating its broad applicability in toxicity prediction

    Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors

    Get PDF
    Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the potential sources of errors associated with the identification of compounds, the representation of their structures, and the calculation of chemical descriptors. It is based on a case study where machine learning techniques were applied to data from noncongeneric compounds and a complex toxicologic end point (carcinogenicity). We propose methods applicable to the routine quality control of large chemical datasets, but our main intention is to raise awareness about this topic and to open a discussion about quality assurance in predictive toxicology. The accuracy and reproducibility of toxicity data will be reported in another paper
    corecore