304 research outputs found

    Using Machine Learning On Diverse Datasets To Predict Drug-Induced Liver Injury

    Get PDF
    A major challenge in drug development is safety and toxicity concerns due to drug sideeffects. One such side effect, drug-induced liver injury (DILI), is considered a primary factor in regulatory clearance. To develop prediction models of DILI, the Critical Assessment of Massive Data Analysis (CAMDA) 2020 CMap Drug Safety Challenge goal was established with an ultimate goal to develop prediction models based on gene perturbation of six preselected cell-lines (CMap L1000), extended structural information (MOLD2), toxicity data (TOX21), and FDA reporting of adverse events (FAERS). Four types of DILI classes were targeted, including two clinically relevant scores and two control classifications, designed by the CAMDA organizers. The L1000 gene expression data had variable drug coverage across cell lines with only 247 out of 617 drugs in the study measured in all six cell types. We addressed this coverage issue by using Kru-Bor ranked merging to generate a singular drug expression signature across all six cell lines. These merged signatures were then narrowed down to the top and bottom 100, 250, 500, or 1,000 genes most perturbed by drug treatment. These signatures were subject to feature selection using Fisher’s exact test to identify genes predictive of DILI status. Models based solely on expression signatures had varying results for clinical DILI subtypes with an accuracy ranging from 0.49 to 0.67 and Matthews Correlation Coefficient (MCC) values ranging from -0.03 to 0.1. Models built using FAERS, MOLD2 and TOX21 also had similar results in predicting clinical DILI scores with accuracy ranging from 0.56 to 0.67 with MCC scores ranging from 0.12 to 0.36. To incorporate these various data types with expression-based models, we utilized soft, hard, and weighted ensemble voting methods using the top three performing models for each DILI classification. These voting models achieved a balanced accuracy up to 0.54 and 0.60 for the clinically relevant DILI subtypes. Overall, from our experiment, traditional machine learning approaches may not be optimal as a classification method for the current data

    Computational models for predicting liver toxicity in the deep learning era

    Get PDF
    Drug-induced liver injury (DILI) is a severe adverse reaction caused by drugs and may result in acute liver failure and even death. Many efforts have centered on mitigating risks associated with potential DILI in humans. Among these, quantitative structure-activity relationship (QSAR) was proven to be a valuable tool for early-stage hepatotoxicity screening. Its advantages include no requirement for physical substances and rapid delivery of results. Deep learning (DL) made rapid advancements recently and has been used for developing QSAR models. This review discusses the use of DL in predicting DILI, focusing on the development of QSAR models employing extensive chemical structure datasets alongside their corresponding DILI outcomes. We undertake a comprehensive evaluation of various DL methods, comparing with those of traditional machine learning (ML) approaches, and explore the strengths and limitations of DL techniques regarding their interpretability, scalability, and generalization. Overall, our review underscores the potential of DL methodologies to enhance DILI prediction and provides insights into future avenues for developing predictive models to mitigate DILI risk in humans

    Comparative analysis of classification techniques for topic-based biomedical literature categorisation

    Get PDF
    Introduction: Scientific articles serve as vital sources of biomedical information, but with the yearly growth in publication volume, processing such vast amounts of information has become increasingly challenging. This difficulty is particularly pronounced when it requires the expertise of highly qualified professionals. Our research focused on the domain-specific articles classification to determine whether they contain information about drug-induced liver injury (DILI). DILI is a clinically significant condition and one of the reasons for drug registration failures. The rapid and accurate identification of drugs that may cause such conditions can prevent side effects in millions of patients.Methods: Developing a text classification method can help regulators, such as the FDA, much faster at a massive scale identify facts of potential DILI of concrete drugs. In our study, we compared several text classification methodologies, including transformers, LSTMs, information theory, and statistics-based methods. We devised a simple and interpretable text classification method that is as fast as Naïve Bayes while delivering superior performance for topic-oriented text categorisation. Moreover, we revisited techniques and methodologies to handle the imbalance of the data.Results: Transformers achieve the best results in cases if the distribution of classes and semantics of test data matches the training set. But in cases of imbalanced data, simple statistical-information theory-based models can surpass complex transformers, bringing more interpretable results that are so important for the biomedical domain. As our results show, neural networks can achieve better results if they are pre-trained on domain-specific data, and the loss function was designed to reflect the class distribution.Discussion: Overall, transformers are powerful architecture, however, in certain cases, such as topic classification, its usage can be redundant and simple statistical approaches can achieve compatible results while being much faster and explainable. However, we see potential in combining results from both worlds. Development of new neural network architectures, loss functions and training procedures that bring stability to unbalanced data is a promising topic of development

    An ensemble learning approach for modeling the systems biology of drug-induced injury

    Get PDF
    Background: Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results: We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions: When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.The authors received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreements TransQST and eTRANSAFE (refs: 116030, 777365). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA companies in kind contribution. The authors also received support from Spanish Ministry of Economy (MINECO, refs: BIO2017–85329-R (FEDER, EU), RYC-2015-17519) as well as EU H2020 Programme 2014–2020 under grant agreement No. 676559 (Elixir-Excelerate) and from Agència de Gestió D’ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR, ref.: 2017SGR01020). L.I.F. received support from ISCIII-FEDER (ref: CPII16/00026). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I + D + i 2013–2016, funded by ISCIII and FEDER. The DCEXS is a “Unidad de Excelencia María de Maeztu”, funded by the MINECO (ref: MDM-2014-0370). J.A.P. received support from the CAMDA Travel Fellowship

    Integrated Study of Liver Fibrosis: Modeling and Clinical Detection

    Get PDF
    The liver is a vital organ that carries out over 500 essential tasks, including fat metabolism, blood filtering, bile production, and some protein production. Although the structure of the liver and the role of each type of cells in the liver are well known, the biomedical and mechanical interplays within liver tissues remain unclear. Chronic liver diseases are a significant public health challenge. All chronic liver diseases lead to liver fibrosis due to excessive fiber accumulation, resulting in cirrhosis and loss of liver function. Only early stage liver fibrosis is reversible. However, early-stage liver fibrosis is difficult to diagnose. How the progression of fibrosis changes the mechanical properties of the liver tissue and altering the dynamics of blood flow is still not well understood. The objective of this dissertation is to integrate the understanding of liver diseases and mechanical modeling to develop several models relating liver fibrosis to blood flow. In collaboration with clinicians specialized in hepatic fibrosis, we integrated computational modeling and clinicopathologic image analysis and proposed a new technology for early stage fibrosis detection. The key results of this research include: (1) A mathematical model of liver fibrosis progression connecting the cellular and molecular mechanisms of fibrosis to tissue rigidity; (2) A novel machine learning-based algorithm to automatically stage liver fibrosis based on pathology images; (3) A physics model to illustrate how the liver stiffness affects the blood flow pattern, predicting a direct relationship between fibrosis stage and ultrasound Doppler measurement of liver blood flow; (4) Statistical analysis of clinical ultrasound Doppler data from fibrosis patients confirming our model prediction. These results lead to a novel noninvasive technology for detecting early stages of liver fibrosis with high accuracy

    Computational Approaches for Drug-Induced Liver Injury (DILI) Prediction: State of the Art and Challenges

    Get PDF
    Drug-induced liver injury (DILI) is one of the prevailing causes of fulminant hepatic failure. It is estimated that three idiosyncratic drug reactions out of four result in liver transplantation or death. Additionally, DILI is the most common reason for withdrawal of an approved drug from the market. Therefore, the development of methods for the early identification of hepatotoxic drug candidates is of crucial importance. This review focuses on the current state of cheminformatics strategies being applied for the early in silico prediction of DILI. Herein, we discuss key issues associated with DILI modelling in terms of the data size, imbalance and quality, complexity of mechanisms, and the different levels of hepatotoxicity to model going from general hepatotoxicity to the molecular initiating events of DILI

    Measuring Chemotherapy Response in Breast Cancer Using Optical and Ultrasound Spectroscopy

    Get PDF
    Purpose: This study comprises two subprojects. In subproject one, the study purpose was to evaluate response to neoadjuvant chemotherapy (NAC) using quantitative ultrasound (QUS) and diffuse optical spectroscopy imaging (DOS) in locally advanced breast cancer (LABC) during chemotherapy. In subproject two, DOS-based functional maps were analysed with texture-based image features to predict breast cancer response before the start of NAC. Patients and Measurements: The institution’s ethics review board approved this study. For subproject one, subjects (n=22) gave written consent before participating in the study. Participants underwent non-invasive, DOS and QUS imaging. Data were acquired at weeks 0 (i.e. baseline), 1, 4, 8 and before surgical removal of the tumour (mastectomy and/or lumpectomy); corresponding to chemotherapy schedules. QUS parameters including the midband fit (MBF), 0-MHz intercept (SI), and the spectral slope (SS) were determined from tumour ultrasound data using spectral analysis. In the same patients, DOS was used to measure parameters relating to tumour haemoglobin and tissue composition such as %Water and %Lipids. Discriminant analysis and receiver-operating characteristic (ROC) analyses were used to correlate the measured imaging parameters to Miller-Payne pathological response during treatment. Additionally, multivariate analysis was carried out for pairwise DOS and QUS parameter combinations to determine if an increase in the classification accuracy could be obtained using combination DOS and QUS parametric models. For subproject two, 15 additional patients we recruited after first giving their written informed consent. A pooled analysis was completed for all DOS baseline data (subproject 1 and subproject 2; n=37 patients). LABC patients planned for NAC had functional DOS maps and associated textural features generated. A grey-level co-occurrence matrix (texture) analysis was completed for parameters associated with haemoglobin, tissue composition, and optical properties (deoxy-haemoglobin [Hb], oxy-haemoglobin [HbO2], total haemoglobin [HbT]), %Lipids, %Water, and scattering power [SP], scattering amplitude [SA]) prior to treatment. Textural features included contrast (con), vi correlation (cor), energy (ene), and homogeneity (hom). Patients were classified as ‘responders’ or ‘non-responders’ using Miller-Payne pathological response criteria after treatment completion. In order to test if baseline univariate texture features could predict treatment response, a receiver operating characteristic (ROC) analysis was performed, and the optimal sensitivity, specificity and area under the curve (AUC) was calculated using Youden’s index (Q-point) from the ROC. Multivariate analysis was conducted to test 40 DOS-texture features and all possible bivariate combinations using a naïve Bayes model, and k-nearest neighbour (k-NN) model classifiers were included in the analysis. Using these machine-learning algorithms, the pretreatment DOS-texture parameters underwent dataset training, testing, and validation and ROC analysis were performed to find the maximum sensitivity and specificity of bivariate DOS-texture features. Results: For subproject one, individual DOS and QUS parameters, including the spectral intercept (SI), oxy-haemoglobin (HbO2), and total haemoglobin (HbT) were significant markers for response outcome after one week of treatment (p<0.01). Multivariate (pairwise) combinations increased the sensitivity, specificity and AUC at this time; the SI+HbO2 showed a sensitivity/specificity of 100%, and an AUC of 1.0 after one week of treatment. For subproject two, the results indicated that textural characteristics of pre-treatment DOS parametric maps can differentiate treatment response outcomes. The HbO2-homogeneity resulted in the highest accuracy amongst univariate parameters in predicting response to chemotherapy: sensitivity (%Sn) and specificity (%Sp) = 86.5 and 89.0%, respectively and an accuracy of 87.8%. The highest predictors using multivariate (binary) combination features were the Hb-Contrast + HbO2-Homogeneity which resulted in a %Sn = 78.0, a %Sp = 81.0% and an accuracy of 79.5% using the naïve Bayes model. Conclusion: DOS and QUS demonstrated potential as coincident markers for treatment response and may potentially facilitate response-guided therapies. Also, the results of this study demonstrated that DOS-texture analysis can be used to predict breast cancer response groups prior to starting NAC using baseline DOS measurements

    A systems toxicology framework for improving the identification of paracetamol overdose

    Get PDF
    Paracetamol (APAP) overdose is the leading cause of acute liver failure and a concerning global health issue. However, the current clinical treatment framework is heavily criticized for its sub-optimality. Within this thesis, a systems toxicology approach is taken in an attempt to provide further insight into the APAP overdose problem, and propose potential improvements to the current treatment framework. In Chapter 2, a proof-of-concept pre-clinical pharmacokinetic-pharmacodynamic (PKPD) model describing APAP metabolism and corresponding toxicity biomarkers (ALT, HMGB1, full K18, fragmented K18) is defined. A statistical model is combined with the PKPD framework to simulate in silico population groups with the aim of predicting initial APAP dose, time since overdose, and probability of liver injury. In chapter 3, an identifiability analysis is performed on the PKPD model to identify parameter uncertainties. The model is also extended, enabling predictions for individuals deemed both “healthy” and “high-risk”. In 2017 I was awarded the in vitro toxicology society mini-fellowship award, which funded 4 weeks of training in experimental wet-lab techniques. The training was used to investigate the effects of various combinations of APAP and its antidote, N’Acetylcysteine (NAC), on in vitro hepatocyte functionality. Subsequently, in chapter 4, the effect of the antidote (NAC) is incorporated into the PKPD model structure, and an additional toxicity measure is defined, describing severe loss of cell functionality. Different NAC regimens are tested, investigating their effect on both of our proposed toxicity measures. Through collaboration with the Royal Infirmary, Edinburgh, we obtained access to a clinical dataset of approximately 3,600 APAP overdose patients. In Chapter 5, a population-pharmacokinetic (Pop-PK) APAP model is defined, with PK parameters optimised based on this dataset. The framework has the ability to account for random inter-individual differences in PK parameter values. Current clinical toxicity thresholds are investigated and compared to those proposed by our model for various demographic groups

    USING MACHINE LEARNING TO PREDICT ACUTE KIDNEY INJURIES AMONG PATIENTS TREATED WITH EMPIRIC ANTIBIOTICS

    Get PDF
    Acute kidney injury (AKI) is a significant adverse effect of many medications that leads to increased morbidity, cost, and mortality among hospitalized patients. Recent literature supports a strong link between empiric combination antimicrobial therapy and increased AKI risk. As briefly summarized below, the following chapters describe my research conducted in this area. Chapter 1 presents and summarizes the published literature connecting combination antimicrobial therapy with increased AKI incidence. This chapter sets the specific aims I aim to achieve during my dissertation project. Chapter 2 describes a study in which patients receiving vancomycin (VAN) in combination with piperacillin-tazobactam (TZP) or cefepime (CFP). I matched over 1,600 patients receiving both combinations and found a significantly lower incidence of AKI among patient receiving the CFP+VAN combination when controlling for confounders. The conclusion of this study is that VAN+TZP has significantly increased risk of AKI compared to CFP+VAN, confirming the results of previous literature. Chapter 3 presents a study of patients receiving VAN in combination with meropenem (MEM) or TZP. This study included over 10,000 patients and used inverse probability of treatment weighting to conserve data for this population. After controlling for confounders, VAN+TZP was associated with significantly more AKI than VAN+MEM. This study demonstrates that MEM is clinically viable alternative to TZP in empiric antimicrobial therapy. Chapter 4 describes a study in which patients receiving TZP or ampicillin-sulbactam (SAM) with or without VAN were analyzed for AKI incidence. The purpose of this study was to identify whether the addition of a beta-lactamase inhibitor to a beta-lactam increased the risk of AKI. This study included more than 2,400 patients receiving either agent and found that there were no differences in AKI among patients receiving SAM or TZP; however, AKI was significantly more common in the TZP group when stratified by VAN exposure. This study shows that comparisons of TZP to other beta-lactams without beta-lactamase inhibitors are valid. Chapter 5 presents a study of almost 30,000 patients who received combination antimicrobial therapy over an 8-year period. This study demonstrates similar AKI incidence to previous literature and the studies presented in the previous chapters. Additionally, the results of the predictive models suggest that further work in this research area is needed. The studies conducted present a clear message that patients receiving VAN+TZP are at significantly greater risk of AKI than alternative regimens for empiric coverage of infection
    corecore