9 research outputs found

    Application of variations of non-linear CCA for feature selection in drug sensitivity prediction

    Get PDF
    Cancer arises due to the genetic alteration in patient DNA. Many studies indicate the fact that these alterations vary among patients and can affect the therapeutic effect of cancer treatment dramatically. Therefore, extensive studies focus on understanding these alterations and their effects. Pre-clinical models play an important role in cancer drug discovery and cancer cell lines are one of the main ingredients of these pre-clinical studies which can capture many different aspects of multi-omics properties of cancer cells. However, the assessment of cancer cell line responses to different drugs is faulty and laborious. Therefore, in-silico models, which perform accurate prediction of drug sensitivity values, enhance cancer drug discovery. In the past decade, many computational methods achieved high performances by studying similarity between cancer cell lines and drug compounds and used them to obtain an accurate predictive model for unknown instances. In this thesis, we study the effect of non-linear feature selection through two variations of canonical correlation analysis, KCCA, and HSIC-SCCA, on the prediction of drug sensitivity. To estimate the performance of these features we use pairwise kernel ridge regression to predict the drug sensitivity, measured as IC50 values. The data set under study is a subset of Genomics of Drug Sensitivity in Cancer comprise of 124 cell lines and 124 drug compounds. The high diversity between cell lines and drug compound samples and the high dimension of data matrices reduce the accuracy of the model obtained by pairwise kernel ridge regression. This accuracy reduced by employing HSIC-SCCA method as a dimension reduction step since the HSIC-SCCA method increased the differences among the samples by employing different projection vectors for samples in different folds of cross-validation. Therefore, the obtained variables are rotated to provide more homogeneous samples. This step slightly improved the accuracy of the model

    DrugComb update: a more comprehensive drug sensitivity data repository and analysis portal

    Get PDF
    gkab438Combinatorial therapies that target multiple pathways have shown great promises for treating complex diseases. DrugComb (https://drugcomb.org/) is a web-based portal for the deposition and analysis of drug combination screening datasets. Since its first release, DrugComb has received continuous updates on the coverage of data resources, as well as on the functionality of the web server to improve the analysis, visualization and interpretation of drug combination screens. Here, we report significant updates of DrugComb, including: (i) manual curation and harmonization of more comprehensive drug combination and monotherapy screening data, not only for cancers but also for other diseases such as malaria and COVID-19; (ii) enhanced algorithms for assessing the sensitivity and synergy of drug combinations; (iii) network modelling tools to visualize the mechanisms of action of drugs or drug combinations for a given cancer sample and (iv) state-of-the-art machine learning models to predict drug combination sensitivity and synergy. These improvements have been provided with more user-friendly graphical interface and faster database infrastructure, which make DrugComb the most comprehensive web-based resources for the study of drug sensitivities for multiple diseases.Peer reviewe

    SynergyFinder Plus: Toward Better Interpretation and Annotation of Drug Combination Screening Datasets

    Get PDF
    Combinatorial therapies have been recently proposed to improve the efficacy of anticancer treatment. The SynergyFinder R package is a software used to analyze pre-clinical drug combination datasets. Here, we report the major updates to the SynergyFinder R package for improved interpretation and annotation of drug combination screening results. Unlike the existing implementations, the updated SynergyFinder R package includes five main innovations. 1) We extend the mathematical models to higher-order drug combination data analysis and implement dimension reduction techniques for visualizing the synergy landscape. 2) We provide a statistical analysis of drug combination synergy and sensitivity with confidence intervals and P values. 3) We incorporate a synergy barometer to harmonize multiple synergy scoring methods to provide a consensus metric for synergy. 4) We evaluate drug combination synergy and sensitivity to provide an unbiased interpretation of the clinical potential. 5) We enable fast annotation of drugs and cell lines, including their chemical and target information. These annotations will improve the interpretation of the mechanisms of action of drug combinations. To facilitate the use of the R package within the drug discovery community, we also provide a web server at www.synergyfinderplus.org as a user-friendly interface to enable a more flexible and versatile analysis of drug combination data.Peer reviewe

    Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

    Get PDF
    Abstract: Machine learning methods offer great promise for fast and accurate detection and prognostication of coronavirus disease 2019 (COVID-19) from standard-of-care chest radiographs (CXR) and chest computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we consider all published papers and preprints, for the period from 1 January 2020 to 3 October 2020, which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. All manuscripts uploaded to bioRxiv, medRxiv and arXiv along with all entries in EMBASE and MEDLINE in this timeframe are considered. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 62 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher-quality model development and well-documented manuscripts

    A pipeline to further enhance quality, integrity and reusability of the NCCID clinical data

    No full text
    The National COVID-19 Chest Imaging Database (NCCID) is a centralized UK database of thoracic imaging and corresponding clinical data. It is made available by the National Health Service Artificial Intelligence (NHS AI) Lab to support the development of machine learning tools focused on Coronavirus Disease 2019 (COVID-19). A bespoke cleaning pipeline for NCCID, developed by the NHSx, was introduced in 2021. We present an extension to the original cleaning pipeline for the clinical data of the database. It has been adjusted to correct additional systematic inconsistencies in the raw data such as patient sex, oxygen levels and date values. The most important changes will be discussed in this paper, whilst the code and further explanations are made publicly available on GitLab. The suggested cleaning will allow global users to work with more consistent data for the development of machine learning tools without being an expert. In addition, it highlights some of the challenges when working with clinical multi-center data and includes recommendations for similar future initiatives.Peer reviewe

    The impact of imputation quality on machine learning classifiers for datasets with missing values

    No full text
    Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods, followed by classification of the now complete samples. The focus of the machine learning researcher is to optimise the classifier’s performance.Peer reviewe
    corecore