2,902 research outputs found
Development and evaluation of machine learning algorithms for biomedical applications
Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.
This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques for selecting topological features suitable for link prediction in gene networks; a graph sparsification method for network sampling; (iii) combined supervised and unsupervised methods to infer gene networks; and (iv) sampling and boosting techniques for reverse engineering gene networks. For drug sensitivity prediction problem, the dissertation presents (i) an instance selection technique and hybrid method for drug sensitivity prediction; (ii) a link prediction approach to drug sensitivity prediction; a noise-filtering method for drug sensitivity prediction; and (iv) transfer learning approaches for enhancing the performance of drug sensitivity prediction. Substantial experiments are conducted to evaluate the effectiveness and efficiency of the proposed algorithms. Experimental results demonstrate the feasibility of the algorithms and their superiority over the existing approaches
Advanced machine-learning techniques in drug discovery
The popularity of machine learning (ML) across drug discovery continues to grow, yielding impressive results. As their use increases, so do their limitations become apparent. Such limitations include their need for big data, sparsity in data, and their lack of interpretability. It has also become apparent that the techniques are not truly autonomous, requiring retraining even post deployment. In this review, we detail the use of advanced techniques to circumvent these challenges, with examples drawn from drug discovery and allied disciplines. In addition, we present emerging techniques and their potential role in drug discovery. The techniques presented herein are anticipated to expand the applicability of ML in drug discovery
Deep transfer learning for drug response prediction
The goal of precision oncology is to make accurate predictions for cancer patients via some omics data types of individual patients. Major challenges of computational methods for drug response prediction are that labeled clinical data is very limited, not publicly available, or has drug response for one or two drugs. These challenges have been addressed by generating large-scale pre-clinical datasets such as cancer cell lines or patient-derived xenografts (PDX). These pre-clinical datasets have multi-omics characterization of samples and are often screened with hundreds of drugs which makes them viable resources for precision oncology. However, they raise new questions: how can we integrate different data types? how can we handle data discrepancy between pre-clinical and clinical datasets that exist due to basic biological differences? and how can we make the best use of unlabeled samples in drug response prediction where labeling is extra challenging? In this thesis, we propose methods based on deep neural networks to answer these questions. First, we propose a method of multi-omics integration. Second, we propose a transfer learning method to address data discrepancy between cell lines, patients, and PDX models in the input and output space. Finally, we proposed a semi-supervised method of out-of-distribution generalization to predict drug response using labeled and unlabeled samples. The proposed methods have promising performance when compared to the state-of-the-art and may guide precision oncology more accurately
Translational Applications of Artificial Intelligence and Machine Learning for Diagnostic Pathology in Lymphoid Neoplasms: A Comprehensive and Evolutive Analysis
Genomic analysis and digitalization of medical records have led to a big data scenario
within hematopathology. Artificial intelligence and machine learning tools are increasingly used to
integrate clinical, histopathological, and genomic data in lymphoid neoplasms. In this study, we
identified global trends, cognitive, and social framework of this field from 1990 to 2020. Metadata
were obtained from the Clarivate Analytics Web of Science database in January 2021. A total of 525
documents were assessed by document type, research areas, source titles, organizations, and countries. SciMAT and VOSviewer package were used to perform scientific mapping analysis. Geographical distribution showed the USA and People’s Republic of China as the most productive
countries, reporting up to 190 (36.19%) of all documents. A third-degree polynomic equation predicts that future global production in this area will be three-fold the current number, near 2031.
Thematically, current research is focused on the integration of digital image analysis and genomic
sequencing in Non-Hodgkin lymphomas, prediction of chemotherapy response and validation of
new prognostic models. These findings can serve pathology departments to depict future clinical
and research avenues, but also, public institutions and administrations to promote synergies and
optimize funding allocation.Andalusia
Health System - RH-0145-2020EU FEDER ITI Grant for Cadiz Province PI-0032-201
Recommended from our members
Personalized Medicine: Studies of Pharmacogenomics in Yeast and Cancer
Advances in microarray and sequencing technology enable the era of personalized medicine. With increasing availability of genomic assays, clinicians have started to utilize genetics and gene expression of patients to guide clinical care. Signatures of gene expression and genetic variation in genes have been associated with disease risks and response to clinical treatment. It is therefore not difficult to envision a future where each patient will have clinical care that is optimized based on his or her genetic background and genomic profiles. However, many challenges exist towards the full realization of the potential personalized medicine. The human genome is complex and we have yet to gain a better understanding of how to associate genomic data with phenotype. First, the human genome is very complex: more than 50 million sequence variants and more than 20,000 genes have been reported. Many efforts have been devoted to genome-wide association studies (GWAS) in the last decade, associating common genetic variants with common complex traits and diseases. While many associations have been identified by genome-wide association studies, most of our phenotypic variation remains unexplained, both at the level of the variants involved and the underlying mechanism. Finally, interaction between genetics and environment presents additional layer of complexity governing phenotypic variation. Currently, there is much research developing computational methods to help associate genomic features with phenotypic variation. Modeling techniques such as machine learning have been very useful in uncovering the intricate relationships between genomics and phenotype. Despite some early successes, the performance of most models is disappointing. Many models lack robustness and predictions do not replicate. In addition, many successful models work as a black box, giving good predictions of phenotypic variation but unable to reveal the underlying mechanism. In this thesis I propose two methods addressing this challenge. First, I describe an algorithm that focuses on identifying causal genomic features of phenotype. My approach assumes genomic features predictive of phenotype are more likely to be causal. The algorithm builds models that not only accurately predict the traits, but also uncover molecular mechanisms that are responsible for these traits. . The algorithm gains its power by combining regularized linear regression, causality testing and Bayesian statistics. I demonstrate the application of the algorithm on a yeast dataset, where genotype and gene expression are used to predict drug sensitivity and elucidate the underlying mechanisms. The accuracy and robustness of the algorithm are both evaluated statistically and experimentally validated. The second part of the thesis takes on a much more complicated system: cancer. The availability of genomic and drug sensitivity data of cancer cell lines has recently been made available. The challenge here is not only the increasing complexity of the system (e.g. size of genome), but also the fundamental differences between cancers and tissues. Different cancers or tissues provide different contexts influencing regulatory networks and signaling pathways. In order to account for this, I propose a method to associate contextual genomic features with drug sensitivity. The algorithm is based on information theory, Bayesian statistics, and transfer learning. The algorithm demonstrates the importance of context specificity in predictive modeling of cancer pharmacogenomics. The two complementary algorithms highlight the challenges faced in personalized medicine and the potential solutions. This thesis detailed the results and analysis that demonstrate the importance of causality and context specificity in predictive modeling of drug response, which will be crucial for us towards bringing personalized medicine in practice
LABRAD : Vol 39, Issue 1 - September 2013
Immunophenotyping by Flowcytometry Chronic Lymphocytic Leukaemia: Diagnosis and Prognostic Factors Tumour Markers Role of Chemical Pathology in Screening and Diagnosis of Multiple Myeloma 1p/19q Deletion: Favourable Prognostic Marker for Oligodendroglioma EGFR Mutation Screening Test for Lung Cancer Patients Clinical Utility of BCR-ABL1 Kinase Domain Mutational Analysis Molecular Cytogenetic Testing for Acute Myeloid Leukaemia Diffuse Large B-Cell Lymphoma (DLBCL) Subgroups have Different Phenotypehttps://ecommons.aku.edu/labrad/1006/thumbnail.jp
Deep multiple-instance learning for detecting multiple myeloma in CT scans of large bones
S nástupem moderních algoritmů strojového učení vzrostla popularita tématu automatické interpretace výstupů zobrazovacích metod v medicíně pomocí počítačů. Konvoluční neuronové sítě v současné době excelují v mnoha oblastech strojového vidění včetně rozpoznávání obrazu. V této diplomové práci zkoumáme možnosti využití konvolučních sítí jako diagnostického nástroje pro detekci abnormalit v CT snímcích stehenních kostí. Zaměřujeme se na diagnózu mnohočetného myelomu pro nějž jsou charakteristické viditelné léze v kostní dřeni, které lze pozorovat při vyšetření pomocí počítačové tomografie. Bylo otestováno několik různých přístupů včetně učení z více instancí. Náš klasifikátor podává spolehlivý výkon v experimentech s plně supervizovaným učením, vykazuje ovšem zásadní neschopnost konvergence při učení z více instancí. Předpokládáme, že náš navrhovaný neuronový model potřebuje ke konvergenci silnější chybovou odezvu a na toto téma navrhujeme budoucí možná vylepšení.The employment of computer aided diagnosis (CAD) systems for interpretation of medical images has become an increasingly popular topic with the arrival of modern machine learning algorithms. Convolutional neural networks perform exceptionally well nowadays in various pattern recognition tasks including image classification. In this thesis we examine the capabilities of a convolutional neural network binary classifier as a CAD system for detection of abnormalities in CT images of femurs. We focus on the diagnosis of multiple myeloma characterized by symptomatic bone marrow lesions commonly observable through computer tomography screening. Different approaches to the problem including multiple instance learning (MIL) were tested. The classifier showed a solid performance in our fully supervised experimental setting, it however exhibits a serious inability to learn from multiple instances. We conclude that the proposed neural model needs a stronger error signal in order to converge in the standard MIL setting and suggest potential improvements for further work in this area
Establishment of predictive blood-based signatures in medical large scale genomic data sets : Development of novel diagnostic tests
Increasing data has led to tremendous success in discovering molecular biomarkers based on high throughput data. However, the translation of these so-called genomic signatures into clinical practice has been limited. The complexity and volume of genomic profiling requires heightened attention to robust design, methodological details, and avoidance of bias. During this thesis, novel strategies aimed at closing the gap from initially promising pilot studies to the clinical application of novel biomarkers are evaluated. First, a conventional process for genomic biomarker development comprising feature selection, algorithm and parameter optimization, and performance assessment was established. Using this approach, a RNA-stabilized whole blood diagnostic classifier for non-small cell lung cancer was built in a training set that can be used as a biomarker to discriminate between patients and control samples. Subsequently, this optimized classifier was successfully applied to two independent and blinded validation sets. Extensive permutation analysis using random feature lists supports the specificity of the established transcriptional classifier. Next, it was demonstrated that a combined approach of clinical trial simulation and adaptive learning strategies can be used to speed up biomarker development. As a model, genome-wide expression data derived from over 4,700 individuals in 37 studies addressing four clinical endpoints were used to assess over 1,800,000 classifiers. In addition to current approaches determining optimal classifiers within a defined study setting, randomized clinical trial simulation unequivocally uncovered the overall variance in the prediction performance of potential disease classifiers to predict the outcome of a large biomarker validation study from a pilot trial. Furthermore, most informative features were identified by feature ranking according to an individual classification performance score. Applying an adaptive learning strategy based on data extrapolation led to a datadriven prediction of the study size required for larger validation studies based on small pilot trials and an estimate of the expected statistical performance during validation. With these significant improvements, exceedingly robust and clinically applicable gene signatures for the diagnosis and detection of acute myeloid leukemia, active tuberculosis, HIV infection, and non-small cell lung cancer are established which could demonstrate disease-related enrichment of the obtained signatures and phenotype-related feature ranking. In further research, platform requirements for blood-based biomarker development were exemplarily examined for micro RNA expression profiling. The performance as well as the technical sample handling to provide reliable strategies for platform implementation in clinical applications were investigated. Overall, all introduced methods improve and accelerate the development of biomarker signatures for molecular diagnostics and can easily be extended to other high throughput data and other disease settings
A transfer learning approach to drug resistance classification in mixed HIV dataset
Funding: This research is funded by the Tertiary Education Trust Fund (TETFund), Nigeria.As we advance towards individualized therapy, the ‘one-size-fits-all’ regimen is gradually paving the way for adaptive techniques that address the complexities of failed treatments. Treatment failure is associated with factors such as poor drug adherence, adverse side effect/reaction, co-infection, lack of follow-up, drug-drug interaction and more. This paper implements a transfer learning approach that classifies patients' response to failed treatments due to adverse drug reactions. The research is motivated by the need for early detection of patients' response to treatments and the generation of domain-specific datasets to balance under-represented classification data, typical of low-income countries located in Sub-Saharan Africa. A soft computing model was pre-trained to cluster CD4+ counts and viral loads of treatment change episodes (TCEs) processed from two disparate sources: the Stanford HIV drug resistant database (https://hivdb.stanford.edu), or control dataset, and locally sourced patients' records from selected health centers in Akwa Ibom State, Nigeria, or mixed dataset. Both datasets were experimented on a traditional 2-layer neural network (NN) and a 5-layer deep neural network (DNN), with odd dropout neurons distribution resulting in the following configurations: NN (Parienti et al., 2004) [32], NN (Deniz et al., 2018) [53] and DNN [9 7 5 3 1]. To discern knowledge of failed treatment, DNN1 [9 7 5 3 1] and DNN2 [9 7 5 3 1] were introduced to model both datasets and only TCEs of patients at risk of drug resistance, respectively. Classification results revealed fewer misclassifications, with the DNN architecture yielding best performance measures. However, the transfer learning approach with DNN2 [9 7 3 1] configuration produced superior classification results when compared to other variants/configurations, with classification accuracy of 99.40%, and RMSE values of 0.0056, 0.0510, and 0.0362, for test, train, and overall datasets, respectively. The proposed system therefore indicates good generalization and is vital as decision-making support to clinicians/physicians for predicting patients at risk of adverse drug reactions. Although imbalanced features classification is typical of disease problems and diminishes dependence on classification accuracy, the proposed system still compared favorably with the literature and can be hybridized to improve its precision and recall rates.Publisher PDFPeer reviewe
- …