248 research outputs found

    Using Machine Learning On Diverse Datasets To Predict Drug-Induced Liver Injury

    Get PDF
    A major challenge in drug development is safety and toxicity concerns due to drug sideeffects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance. To develop prediction models of DILI, the Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was established with an ultimate goal to develop prediction models based on gene perturbation of six preselected cell-lines (CMap L1000), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). Four types of DILI classes were targeted, including two clinically relevant scores and two control classifications, designed by the CAMDA organizers. The L1000 gene expression data had variable drug coverage across cell lines with only 247 out of 617 drugs in the study measured in all six cell types. We addressed this coverage issue by using Kru-Bor ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom 100, 250, 500, or 1,000 genes most perturbed by drug treatment. These signatures were subject to feature selection using Fisher’s exact test to identify genes predictive of DILI status. Models based solely on expression signatures had varying results for clinical DILI subtypes with an accuracy ranging from 0.49 to 0.67 and Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1. Models built using FAERS, MOLD2 and TOX21 also had similar results in predicting clinical DILI scores with accuracy ranging from 0.56 to 0.67 with MCC scores ranging from 0.12 to 0.36. To incorporate these various data types with expression-based models, we utilized soft, hard, and weighted ensemble voting methods using the top three performing models for each DILI classification. These voting models achieved a balanced accuracy up to 0.54 and 0.60 for the clinically relevant DILI subtypes. Overall, from our experiment, traditional machine learning approaches may not be optimal as a classification method for the current data

    Computational Approaches for Drug-Induced Liver Injury (DILI) Prediction: State of the Art and Challenges

    Get PDF
    Drug-induced liver injury (DILI) is one of the prevailing causes of fulminant hepatic failure. It is estimated that three idiosyncratic drug reactions out of four result in liver transplantation or death. Additionally, DILI is the most common reason for withdrawal of an approved drug from the market. Therefore, the development of methods for the early identification of hepatotoxic drug candidates is of crucial importance. This review focuses on the current state of cheminformatics strategies being applied for the early in silico prediction of DILI. Herein, we discuss key issues associated with DILI modelling in terms of the data size, imbalance and quality, complexity of mechanisms, and the different levels of hepatotoxicity to model going from general hepatotoxicity to the molecular initiating events of DILI

    Toward predictive models for drug-induced liver injury in humans: are we there yet?

    Get PDF
    Drug-induced liver injury (DILI) is a frequent cause for the termination of drug development programs and a leading reason of drug withdrawal from the marketplace. Unfortunately, the current preclinical testing strategies, including the regulatory-required animal toxicity studies or simple in vitro tests, are insufficiently powered to predict DILI in patients reliably. Notably, the limited predictive power of such testing strategies is mostly attributed to the complex nature of DILI, a poor understanding of its mechanism, a scarcity of human hepatotoxicity data and inadequate bioinformatics capabilities. With the advent of high-content screening assays, toxicogenomics and bioinformatics, multiple end points can be studied simultaneously to improve prediction of clinically relevant DILIs. This review focuses on the current state of efforts in developing predictive models from diverse data sources for potential use in detecting human hepatotoxicity, and also aims to provide perspectives on how to further improve DILI prediction

    Applicability domains of neural networks for toxicity prediction

    Get PDF
    In this paper, the term "applicability domain" refers to the range of chemical compounds for which the statistical quantitative structure-activity relationship (QSAR) model can accurately predict their toxicity. This is a crucial concept in the development and practical use of these models. First, a multidisciplinary review is provided regarding the theory and practice of applicability domains in the context of toxicity problems using the classical QSAR model. Then, the advantages and improved performance of neural networks (NNs), which are the most promising machine learning algorithms, are reviewed. Within the domain of medicinal chemistry, nine different methods using NNs for toxicity prediction were compared utilizing 29 alternative artificial intelligence (AI) techniques. Similarly, seven NN-based toxicity prediction methodologies were compared to six other AI techniques within the realm of food safety, 11 NN-based methodologies were compared to 16 different AI approaches in the environmental sciences category and four specific NN-based toxicity prediction methodologies were compared to nine alternative AI techniques in the field of industrial hygiene. Within the reviewed approaches, given known toxic compound descriptors and behaviors, we observed a difficulty in being able to extrapolate and predict the effects with untested chemical compounds. Different methods can be used for unsupervised clustering, such as distance-based approaches and consensus-based decision methods. Additionally, the importance of model validation has been highlighted within a regulatory context according to the Organization for Economic Co-operation and Development (OECD) principles, to predict the toxicity of potential new drugs in medicinal chemistry, to determine the limits of detection for harmful substances in food to predict the toxicity limits of chemicals in the environment, and to predict the exposure limits to harmful substances in the workplace. Despite its importance, a thorough application of toxicity models is still restricted in the field of medicinal chemistry and is virtually overlooked in other scientific domains. Consequently, only a small proportion of the toxicity studies conducted in medicinal chemistry consider the applicability domain in their mathematical models, thereby limiting their predictive power to untested drugs. Conversely, the applicability of these models is crucial; however, this has not been sufficiently assessed in toxicity prediction or in other related areas such as food science, environmental science, and industrial hygiene. Thus, this review sheds light on the prevalent use of Neural Networks in toxicity prediction, thereby serving as a valuable resource for researchers and practitioners across these multifaceted domains that could be extended to other fields in future research

    Toxicity prediction using multi-disciplinary data integration and novel computational approaches

    Get PDF
    Current predictive tools used for human health assessment of potential chemical hazards rely primarily on either chemical structural information (i.e., cheminformatics) or bioassay data (i.e., bioinformatics). Emerging data sources such as chemical libraries, high throughput assays and health databases offer new possibilities for evaluating chemical toxicity as an integrated system and overcome the limited predictivity of current fragmented efforts; yet, few studies have combined the new data streams. This dissertation tested the hypothesis that integrative computational toxicology approaches drawing upon diverse data sources would improve the prediction and interpretation of chemically induced diseases. First, chemical structures and toxicogenomics data were used to predict hepatotoxicity. Compared with conventional cheminformatics or toxicogenomics models, interpretation was enriched by the chemical and biological insights even though prediction accuracy did not improve. This motivated the second project that developed a novel integrative method, chemical-biological read-across (CBRA), that led to predictive and interpretable models amenable to visualization. CBRA was consistently among the most accurate models on four chemical-biological data sets. It highlighted chemical and biological features for interpretation and the visualizations aided transparency. Third, we developed an integrative workflow that interfaced cheminformatics prediction with pharmacoepidemiology validation using a case study of Stevens Johnson Syndrome (SJS), an adverse drug reaction (ADR) of major public health concern. Cheminformatics models first predicted potential SJS inducers and non-inducers, prioritizing them for subsequent pharmacoepidemiology evaluation, which then confirmed that predicted non-inducers were statistically associated with fewer SJS occurrences. By combining cheminformatics' ability to predict SJS as soon as drug structures are known, and pharmacoepidemiology's statistical rigor, we have provided a universal scheme for more effective study of SJS and other ADRs. Overall, this work demonstrated that integrative approaches could deliver more predictive and interpretable models. These models can then reliably prioritize high risk chemicals for further testing, allowing optimization of testing resources. A broader implication of this research is the growing role we envision for integrative methods that will take advantage of the various emerging data sources.Doctor of Philosoph
    corecore