49 research outputs found

    Matching anticancer compounds and tumor cell lines by neural networks with ranking loss

    Get PDF
    Computational drug sensitivity models have the potential to improve therapeutic outcomes by identifying targeted drug components that are likely to achieve the highest efficacy for a cancer cell line at hand at a therapeutic dose. State of the art drug sensitivity models use regression techniques to predict the inhibitory concentration of a drug for a tumor cell line. This regression objective is not directly aligned with either of these principal goals of drug sensitivity models: We argue that drug sensitivity modeling should be seen as a ranking problem with an optimization criterion that quantifies a drug’s inhibitory capacity for the cancer cell line at hand relative to its toxicity for healthy cells. We derive an extension to the well-established drug sensitivity regression model PaccMann that employs a ranking loss and focuses on the ratio of inhibitory concentration and therapeutic dosage range. We find that the ranking extension significantly enhances the model’s capability to identify the most effective anticancer drugs for unseen tumor cell profiles based in on in-vitro data

    Machine learning and feature selection for drug response prediction in precision oncology applications

    Get PDF
    In-depth modeling of the complex interplay among multiple omics data measured from cancer cell lines or patient tumors is providing new opportunities toward identification of tailored therapies for individual cancer patients. Supervised machine learning algorithms are increasingly being applied to the omics profiles as they enable integrative analyses among the high-dimensional data sets, as well as personalized predictions of therapy responses using multi-omics panels of response-predictive biomarkers identified through feature selection and cross-validation. However, technical variability and frequent missingness in input "big data" require the application of dedicated data preprocessing pipelines that often lead to some loss of information and compressed view of the biological signal. We describe here the state-of-the-art machine learning methods for anti-cancer drug response modeling and prediction and give our perspective on further opportunities to make better use of high-dimensional multi-omics profiles along with knowledge about cancer pathways targeted by anti-cancer compounds when predicting their phenotypic responses

    Scalable large margin pairwise learning algorithms

    Get PDF
    2019 Summer.Includes bibliographical references.Classification is a major task in machine learning and data mining applications. Many of these applications involve building a classification model using a large volume of imbalanced data. In such an imbalanced learning scenario, the area under the ROC curve (AUC) has proven to be a reliable performance measure to evaluate a classifier. Therefore, it is desirable to develop scalable learning algorithms that maximize the AUC metric directly. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines. However, the computational cost of the kernelized machines hinders their scalability. To address this problem, we propose a large-scale nonlinear AUC maximization algorithm that learns a batch linear classifier on approximate feature space computed via the k-means Nyström method. The proposed algorithm is shown empirically to achieve comparable AUC classification performance or even better than the kernel AUC machines, while its training time is faster by several orders of magnitude. However, the computational complexity of the linear batch model compromises its scalability when training sizable datasets. Hence, we develop a second-order online AUC maximization algorithms based on a confidence-weighted model. The proposed algorithms exploit the second-order information to improve the convergence rate and implement a fixed-size buffer to address the multivariate nature of the AUC objective function. We also extend our online linear algorithms to consider an approximate feature map constructed using random Fourier features in an online setting. The results show that our proposed algorithms outperform or are at least comparable to the competing online AUC maximization methods. Despite their scalability, we notice that online first and second-order AUC maximization methods are prone to suboptimal convergence. This can be attributed to the limitation of the hypothesis space. A potential improvement can be attained by learning stochastic online variants. However, the vanilla stochastic methods also suffer from slow convergence because of the high variance introduced by the stochastic process. We address the problem of slow convergence by developing a fast convergence stochastic AUC maximization algorithm. The proposed stochastic algorithm is accelerated using a unique combination of scheduled regularization update and scheduled averaging. The experimental results show that the proposed algorithm performs better than the state-of-the-art online and stochastic AUC maximization methods in terms of AUC classification accuracy. Moreover, we develop a proximal variant of our accelerated stochastic AUC maximization algorithm. The proposed method applies the proximal operator to the hinge loss function. Therefore, it evaluates the gradient of the loss function at the approximated weight vector. Experiments on several benchmark datasets show that our proximal algorithm converges to the optimal solution faster than the previous AUC maximization algorithms

    Wavelet-Based Cancer Drug Recommender System

    Get PDF
    A natureza molecular do cancro serve de base para estudos sistemáticos de genomas cancerígenos, fornecendo valiosos insights e permitindo o desenvolvimento de tratamentos clínicos. Acima de tudo, estes estudos estão a impulsionar o uso clínico de informação genómica na escolha de tratamentos, de outro modo não expectáveis, em pacientes com diversos tipos de cancro, possibilitando a medicina de precisão. Com isso em mente, neste projeto combinamos técnicas de processamento de imagem, para aprimoramento de dados, e sistemas de recomendação para propor um ranking personalizado de drogas anticancerígenas. O sistema é implementado em Python e testado usando uma base de dados que contém registos de sensibilidade a drogas, com mais de 310.000 IC50 que, por sua vez, descrevem a resposta de mais de 300 drogas anticancerígenas em 987 linhas celulares cancerígenas. Após várias tarefas de pré-processamento, são realizadas duas experiências. A primeira experiência usa as imagens originais de microarrays de DNA e a segunda usa as mesmas imagens, mas submetidas a uma transformada wavelet. As experiências confirmam que as imagens de microarrays de DNA submetidas a transformadas wavelet melhoram o desempenho do sistema de recomendação, otimizando a pesquisa de linhas celulares cancerígenas com perfil semelhante ao da nova linha celular. Além disso, concluímos que as imagens de microarrays de DNA com transformadas de wavelet apropriadas, não apenas fornecem informações mais ricas para a pesquisa de utilizadores similares, mas também comprimem essas imagens com eficiência, otimizando os recursos computacionais. Tanto quanto é do nosso conhecimento, este projeto é inovador no que diz respeito ao uso de imagens de microarrays de DNA submetidas a transformadas wavelet, para perfilar linhas celulares num sistema de recomendação personalizado de drogas anticancerígenas

    Data augmentation for recommender system: A semi-supervised approach using maximum margin matrix factorization

    Full text link
    Collaborative filtering (CF) has become a popular method for developing recommender systems (RS) where ratings of a user for new items is predicted based on her past preferences and available preference information of other users. Despite the popularity of CF-based methods, their performance is often greatly limited by the sparsity of observed entries. In this study, we explore the data augmentation and refinement aspects of Maximum Margin Matrix Factorization (MMMF), a widely accepted CF technique for the rating predictions, which have not been investigated before. We exploit the inherent characteristics of CF algorithms to assess the confidence level of individual ratings and propose a semi-supervised approach for rating augmentation based on self-training. We hypothesize that any CF algorithm's predictions with low confidence are due to some deficiency in the training data and hence, the performance of the algorithm can be improved by adopting a systematic data augmentation strategy. We iteratively use some of the ratings predicted with high confidence to augment the training data and remove low-confidence entries through a refinement process. By repeating this process, the system learns to improve prediction accuracy. Our method is experimentally evaluated on several state-of-the-art CF algorithms and leads to informative rating augmentation, improving the performance of the baseline approaches.Comment: 20 page

    Bayesian Hybrid Matrix Factorisation for Data Integration

    Get PDF
    We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC), grant reference EP/M506485/1; and Methods for Integrated analysis of Multiple Omics datasets (MIMOmics, 305280)
    corecore