2,902 research outputs found

    Development and evaluation of machine learning algorithms for biomedical applications

    Get PDF
    Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches. This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches

    Advanced machine-learning techniques in drug discovery

    Get PDF
    The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery

    Deep transfer learning for drug response prediction

    Get PDF
    The goal of precision oncology is to make accurate predictions for cancer patients via some omics data types of individual patients. Major challenges of computational methods for drug response prediction are that labeled clinical data is very limited, not publicly available, or has drug response for one or two drugs. These challenges have been addressed by generating large-scale pre-clinical datasets such as cancer cell lines or patient-derived xenografts (PDX). These pre-clinical datasets have multi-omics characterization of samples and are often screened with hundreds of drugs which makes them viable resources for precision oncology. However, they raise new questions: how can we integrate different data types? how can we handle data discrepancy between pre-clinical and clinical datasets that exist due to basic biological differences? and how can we make the best use of unlabeled samples in drug response prediction where labeling is extra challenging? In this thesis, we propose methods based on deep neural networks to answer these questions. First, we propose a method of multi-omics integration. Second, we propose a transfer learning method to address data discrepancy between cell lines, patients, and PDX models in the input and output space. Finally, we proposed a semi-supervised method of out-of-distribution generalization to predict drug response using labeled and unlabeled samples. The proposed methods have promising performance when compared to the state-of-the-art and may guide precision oncology more accurately

    Translational Applications of Artificial Intelligence and Machine Learning for Diagnostic Pathology in Lymphoid Neoplasms: A Comprehensive and Evolutive Analysis

    Get PDF
    Genomic analysis and digitalization of medical records have led to a big data scenario within hematopathology. Artificial intelligence and machine learning tools are increasingly used to integrate clinical, histopathological, and genomic data in lymphoid neoplasms. In this study, we identified global trends, cognitive, and social framework of this field from 1990 to 2020. Metadata were obtained from the Clarivate Analytics Web of Science database in January 2021. A total of 525 documents were assessed by document type, research areas, source titles, organizations, and countries. SciMAT and VOSviewer package were used to perform scientific mapping analysis. Geographical distribution showed the USA and People’s Republic of China as the most productive countries, reporting up to 190 (36.19%) of all documents. A third-degree polynomic equation predicts that future global production in this area will be three-fold the current number, near 2031. Thematically, current research is focused on the integration of digital image analysis and genomic sequencing in Non-Hodgkin lymphomas, prediction of chemotherapy response and validation of new prognostic models. These findings can serve pathology departments to depict future clinical and research avenues, but also, public institutions and administrations to promote synergies and optimize funding allocation.Andalusia Health System - RH-0145-2020EU FEDER ITI Grant for Cadiz Province PI-0032-201

    Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining

    Get PDF

    LABRAD : Vol 39, Issue 1 - September 2013

    Get PDF
    Immunophenotyping by Flowcytometry Chronic Lymphocytic Leukaemia: Diagnosis and Prognostic Factors Tumour Markers Role of Chemical Pathology in Screening and Diagnosis of Multiple Myeloma 1p/19q Deletion: Favourable Prognostic Marker for Oligodendroglioma EGFR Mutation Screening Test for Lung Cancer Patients Clinical Utility of BCR-ABL1 Kinase Domain Mutational Analysis Molecular Cytogenetic Testing for Acute Myeloid Leukaemia Diffuse Large B-Cell Lymphoma (DLBCL) Subgroups have Different Phenotypehttps://ecommons.aku.edu/labrad/1006/thumbnail.jp

    Deep multiple-instance learning for detecting multiple myeloma in CT scans of large bones

    Get PDF
    S nástupem moderních algoritmů strojového učení vzrostla popularita tématu automatické interpretace výstupů zobrazovacích metod v medicíně pomocí počítačů. Konvoluční neuronové sítě v současné době excelují v mnoha oblastech strojového vidění včetně rozpoznávání obrazu. V této diplomové práci zkoumáme možnosti využití konvolučních sítí jako diagnostického nástroje pro detekci abnormalit v CT snímcích stehenních kostí. Zaměřujeme se na diagnózu mnohočetného myelomu pro nějž jsou charakteristické viditelné léze v kostní dřeni, které lze pozorovat při vyšetření pomocí počítačové tomografie. Bylo otestováno několik různých přístupů včetně učení z více instancí. Náš klasifikátor podává spolehlivý výkon v experimentech s plně supervizovaným učením, vykazuje ovšem zásadní neschopnost konvergence při učení z více instancí. Předpokládáme, že náš navrhovaný neuronový model potřebuje ke konvergenci silnější chybovou odezvu a na toto téma navrhujeme budoucí možná vylepšení.The employment of computer aided diagnosis (CAD) systems for interpretation of medical images has become an increasingly popular topic with the arrival of modern machine learning algorithms. Convolutional neural networks perform exceptionally well nowadays in various pattern recognition tasks including image classification. In this thesis we examine the capabilities of a convolutional neural network binary classifier as a CAD system for detection of abnormalities in CT images of femurs. We focus on the diagnosis of multiple myeloma characterized by symptomatic bone marrow lesions commonly observable through computer tomography screening. Different approaches to the problem including multiple instance learning (MIL) were tested. The classifier showed a solid performance in our fully supervised experimental setting, it however exhibits a serious inability to learn from multiple instances. We conclude that the proposed neural model needs a stronger error signal in order to converge in the standard MIL setting and suggest potential improvements for further work in this area

    Establishment of predictive blood-based signatures in medical large scale genomic data sets : Development of novel diagnostic tests

    Get PDF
    Increasing data has led to tremendous success in discovering molecular biomarkers based on high throughput data. However, the translation of these so-called genomic signatures into clinical practice has been limited. The complexity and volume of genomic profiling requires heightened attention to robust design, methodological details, and avoidance of bias. During this thesis, novel strategies aimed at closing the gap from initially promising pilot studies to the clinical application of novel biomarkers are evaluated. First, a conventional process for genomic biomarker development comprising feature selection, algorithm and parameter optimization, and performance assessment was established. Using this approach, a RNA-stabilized whole blood diagnostic classifier for non-small cell lung cancer was built in a training set that can be used as a biomarker to discriminate between patients and control samples. Subsequently, this optimized classifier was successfully applied to two independent and blinded validation sets. Extensive permutation analysis using random feature lists supports the specificity of the established transcriptional classifier. Next, it was demonstrated that a combined approach of clinical trial simulation and adaptive learning strategies can be used to speed up biomarker development. As a model, genome-wide expression data derived from over 4,700 individuals in 37 studies addressing four clinical endpoints were used to assess over 1,800,000 classifiers. In addition to current approaches determining optimal classifiers within a defined study setting, randomized clinical trial simulation unequivocally uncovered the overall variance in the prediction performance of potential disease classifiers to predict the outcome of a large biomarker validation study from a pilot trial. Furthermore, most informative features were identified by feature ranking according to an individual classification performance score. Applying an adaptive learning strategy based on data extrapolation led to a datadriven prediction of the study size required for larger validation studies based on small pilot trials and an estimate of the expected statistical performance during validation. With these significant improvements, exceedingly robust and clinically applicable gene signatures for the diagnosis and detection of acute myeloid leukemia, active tuberculosis, HIV infection, and non-small cell lung cancer are established which could demonstrate disease-related enrichment of the obtained signatures and phenotype-related feature ranking. In further research, platform requirements for blood-based biomarker development were exemplarily examined for micro RNA expression profiling. The performance as well as the technical sample handling to provide reliable strategies for platform implementation in clinical applications were investigated. Overall, all introduced methods improve and accelerate the development of biomarker signatures for molecular diagnostics and can easily be extended to other high throughput data and other disease settings

    A transfer learning approach to drug resistance classification in mixed HIV dataset

    Get PDF
    Funding: This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria.As we advance towards individualized therapy, the ‘one-size-fits-all’ regimen is gradually paving the way for adaptive techniques that address the complexities of failed treatments. Treatment failure is associated with factors such as poor drug adherence, adverse side effect/reaction, co-infection, lack of follow-up, drug-drug interaction and more. This paper implements a transfer learning approach that classifies patients' response to failed treatments due to adverse drug reactions. The research is motivated by the need for early detection of patients' response to treatments and the generation of domain-specific datasets to balance under-represented classification data, typical of low-income countries located in Sub-Saharan Africa. A soft computing model was pre-trained to cluster CD4+ counts and viral loads of treatment change episodes (TCEs) processed from two disparate sources: the Stanford HIV drug resistant database (https://hivdb.stanford.edu), or control dataset, and locally sourced patients' records from selected health centers in Akwa Ibom State, Nigeria, or mixed dataset. Both datasets were experimented on a traditional 2-layer neural network (NN) and a 5-layer deep neural network (DNN), with odd dropout neurons distribution resulting in the following configurations: NN (Parienti et al., 2004) [32], NN (Deniz et al., 2018) [53] and DNN [9 7 5 3 1]. To discern knowledge of failed treatment, DNN1 [9 7 5 3 1] and DNN2 [9 7 5 3 1] were introduced to model both datasets and only TCEs of patients at risk of drug resistance, respectively. Classification results revealed fewer misclassifications, with the DNN architecture yielding best performance measures. However, the transfer learning approach with DNN2 [9 7 3 1] configuration produced superior classification results when compared to other variants/configurations, with classification accuracy of 99.40%, and RMSE values of 0.0056, 0.0510, and 0.0362, for test, train, and overall datasets, respectively. The proposed system therefore indicates good generalization and is vital as decision-making support to clinicians/physicians for predicting patients at risk of adverse drug reactions. Although imbalanced features classification is typical of disease problems and diminishes dependence on classification accuracy, the proposed system still compared favorably with the literature and can be hybridized to improve its precision and recall rates.Publisher PDFPeer reviewe
    corecore