Search CORE

1,832 research outputs found

Efficient Toxicity Prediction via Simple Features Using Shallow Neural Networks and Decision Trees

Author: Karim Abdul
Mishra Avinash
Newton M A Hakim
Sattar Abdul
Publication venue
Publication date: 01/01/2019
Field of study

Toxicity prediction of chemical compounds is a grand challenge. Lately, it achieved significant progress in accuracy but using a huge set of features, implementing a complex blackbox technique such as a deep neural network, and exploiting enormous computational resources. In this paper, we strongly argue for the models and methods that are simple in machine learning characteristics, efficient in computing resource usage, and powerful to achieve very high accuracy levels. To demonstrate this, we develop a single task-based chemical toxicity prediction framework using only 2D features that are less compute intensive. We effectively use a decision tree to obtain an optimum number of features from a collection of thousands of them. We use a shallow neural network and jointly optimize it with decision tree taking both network parameters and input features into account. Our model needs only a minute on a single CPU for its training while existing methods using deep neural networks need about 10 min on NVidia Tesla K40 GPU. However, we obtain similar or better performance on several toxicity benchmark tasks. We also develop a cumulative feature ranking method which enables us to identify features that can help chemists perform prescreening of toxic compounds effectively

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point

Author: Cavasotto Claudio Norberto
Scardino Valeria
Publication venue: American Chemical Society
Publication date: 01/12/2022
Field of study

Machine learning (ML) models to predict the toxicity of small molecules have garnered great attention and have become widely used in recent years. Computational toxicity prediction is particularly advantageous in the early stages of drug discovery in order to filter out molecules with high probability of failing in clinical trials. This has been helped by the increase in the number of large toxicology databases available. However, being an area of recent application, a greater understanding of the scope and applicability of ML methods is still necessary. There are various kinds of toxic end points that have been predicted in silico. Acute oral toxicity, hepatotoxicity, cardiotoxicity, mutagenicity, and the 12 Tox21 data end points are among the most commonly investigated. Machine learning methods exhibit different performances on different data sets due to dissimilar complexity, class distributions, or chemical space covered, which makes it hard to compare the performance of algorithms over different toxic end points. The general pipeline to predict toxicity using ML has already been analyzed in various reviews. In this contribution, we focus on the recent progress in the area and the outstanding challenges, making a detailed description of the state-of-the-art models implemented for each toxic end point. The type of molecular representation, the algorithm, and the evaluation metric used in each research work are explained and analyzed. A detailed description of end points that are usually predicted, their clinical relevance, the available databases, and the challenges they bring to the field are also highlighted.Fil: Cavasotto, Claudio Norberto. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; ArgentinaFil: Scardino, Valeria. Universidad Austral; Argentin

CONICET Digital

Machine Learning Approaches for Improving Prediction Performance of Structure-Activity Relationship Models

Author: Idakwo Gabriel
Publication venue: The Aquila Digital Community
Publication date: 01/08/2020
Field of study

In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies. First, to improve the prediction accuracy of learning from imbalanced data, Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms combined with bagging as an ensemble strategy was evaluated. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that this method significantly outperformed other conventional methods. SMOTEENN with bagging became less effective when IR exceeded a certain threshold (e.g., \u3e40). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p \u3c 0.001, ANOVA) by 22-27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Lastly, current features used for QSAR based machine learning are often very sparse and limited by the logic and mathematical processes used to compute them. Transformer embedding features (TEF) were developed as new continuous vector descriptors/features using the latent space embedding from a multi-head self-attention. The significance of TEF as new descriptors was evaluated by applying them to tasks such as predictive modeling, clustering, and similarity search. An accuracy of 84% on the Ames mutagenicity test indicates that these new features has a correlation to biological activity. Overall, the findings in this study can be applied to improve the performance of machine learning based Quantitative Structure-Activity/Property Relationship (QSAR) efforts for enhanced drug discovery and toxicology assessments

Aquila Digital Community

Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

Author: Afantitis Antreas
Cattelani Luca
Choi Jang-Sik
Federico Antonio
Fratello Michele
Grafström Roland
Greco Dario
Gulumian Mary
Ha My Kieu
Jagiello Karolina
Kinaret Pia Anneli Sofia
Kohonen Pekka
Liampa Irene
Melagraki Georgia
Nymark Penny
Puzyn Tomasz
Sanabria Natasha
Sarimveis Haralambos
Serra Angela
Yoon Tae-Hyun
Publication venue: Multidisciplinary Digital Publishing Institute
Publication date: 08/04/2020
Field of study

Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

Helsingin yliopiston digitaalinen arkisto

Transcriptomics in Toxicogenomics, Part III : Data Modelling for Risk Assessment

Author: Afantitis Antreas
Cattelani Luca
Choi Jang-Sik
Federico Antonio
Fratello Michele
Grafström Roland
Greco Dario
Gulumian Mary
Ha My Kieu
Jagiello Karolina
Kinaret Pia Anneli Sofia
Kohonen Pekka
Liampa Irene
Melagraki Georgia
Nymark Penny
Puzyn Tomasz
Sanabria Natasha
Sarimveis Haralambos
Serra Angela
Yoon Tae-Hyun
Publication venue
Publication date: 01/01/2020
Field of study

Institutional Repository Universiteit Antwerpen

Helsingin yliopiston digitaalinen arkisto

Advanced machine-learning techniques in drug discovery

Author: Basit AW
Elbadawi M
Gaisford S
Publication venue
Publication date: 01/03/2021
Field of study

The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery

UCL Discovery