1,589 research outputs found

    Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A review

    Full text link
    Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.Comment: 41 pages, 4 figures, 2 table

    Effective Feature Representation for Clinical Text Concept Extraction

    Full text link
    Crucial information about the practice of healthcare is recorded only in free-form text, which creates an enormous opportunity for high-impact NLP. However, annotated healthcare datasets tend to be small and expensive to obtain, which raises the question of how to make maximally efficient uses of the available data. To this end, we develop an LSTM-CRF model for combining unsupervised word representations and hand-built feature representations derived from publicly available healthcare ontologies. We show that this combined model yields superior performance on five datasets of diverse kinds of healthcare text (clinical, social, scientific, commercial). Each involves the labeling of complex, multi-word spans that pick out different healthcare concepts. We also introduce a new labeled dataset for identifying the treatment relations between drugs and diseases

    Knowledge-Rich Self-Supervision for Biomedical Entity Linking

    Full text link
    Entity linking faces significant challenges such as prolific variations and prevalent ambiguities, especially in high-value domains with myriad entities. Standard classification approaches suffer from the annotation bottleneck and cannot effectively handle unseen entities. Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia. In this paper, we explore Knowledge-RIch Self-Supervision (KRISS\tt KRISS) for biomedical entity linking, by leveraging readily available domain knowledge. In training, it generates self-supervised mention examples on unlabeled text using a domain ontology and trains a contextual encoder using contrastive learning. For inference, it samples self-supervised mentions as prototypes for each entity and conducts linking by mapping the test mention to the most similar prototype. Our approach can easily incorporate entity descriptions and gold mention labels if available. We conducted extensive experiments on seven standard datasets spanning biomedical literature and clinical notes. Without using any labeled information, our method produces KRISSBERT\tt KRISSBERT, a universal entity linker for four million UMLS entities that attains new state of the art, outperforming prior self-supervised methods by as much as 20 absolute points in accuracy

    Deep transfer learning for drug response prediction

    Get PDF
    The goal of precision oncology is to make accurate predictions for cancer patients via some omics data types of individual patients. Major challenges of computational methods for drug response prediction are that labeled clinical data is very limited, not publicly available, or has drug response for one or two drugs. These challenges have been addressed by generating large-scale pre-clinical datasets such as cancer cell lines or patient-derived xenografts (PDX). These pre-clinical datasets have multi-omics characterization of samples and are often screened with hundreds of drugs which makes them viable resources for precision oncology. However, they raise new questions: how can we integrate different data types? how can we handle data discrepancy between pre-clinical and clinical datasets that exist due to basic biological differences? and how can we make the best use of unlabeled samples in drug response prediction where labeling is extra challenging? In this thesis, we propose methods based on deep neural networks to answer these questions. First, we propose a method of multi-omics integration. Second, we propose a transfer learning method to address data discrepancy between cell lines, patients, and PDX models in the input and output space. Finally, we proposed a semi-supervised method of out-of-distribution generalization to predict drug response using labeled and unlabeled samples. The proposed methods have promising performance when compared to the state-of-the-art and may guide precision oncology more accurately

    ALEC: Active learning with ensemble of classifiers for clinical diagnosis of coronary artery disease

    Get PDF
    Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active learning. This is achieved through selective query of challenging samples for labeling. To the best of our knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers determine whether a patient’s three main coronary arteries are stenotic or not. The fourth classifier predicts whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The training is performed once more using the samples labeled so far. The interleaved phases of labeling and training are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the three main coronary arteries as a sample label and considering the two remaining arteries as sample features

    A modeling platform to predict cancer survival and therapy outcomes using tumor tissue derived metabolomics data.

    Get PDF
    Cancer is a complex and broad disease that is challenging to treat, partially due to the vast molecular heterogeneity among patients even within the same subtype. Currently, no reliable method exists to determine which potential first-line therapy would be most effective for a specific patient, as randomized clinical trials have concluded that no single regimen may be significantly more effective than others. One ongoing challenge in the field of oncology is the search for personalization of cancer treatment based on patient data. With an interdisciplinary approach, we show that tumor-tissue derived metabolomics data is capable of predicting clinical response to systemic therapy classified as disease control vs. progressive disease and pathological stage classified as stage I/II/III vs. stage IV via data analysis with machine-learning techniques (AUROC = 0.970; AUROC=0.902). Patient survival was also analyzed via statistical methods and machine-learning, both of which show that tumor-tissue derived metabolomics data is capable of risk stratifying patients in terms of long vs. short survival (OS AUROC = 0.940TEST; PFS AUROC = 0.875TEST). A set of key metabolites as potential biomarkers and associated metabolic pathways were also found for each outcome, which may lead to insight into biological mechanisms. Additionally, we developed a methodology to calibrate tumor growth related parameters in a well-established mathematical model of cancer to help predict the potential nuances of chemotherapeutic response. The proposed methodology shows results consistent with clinical observations in predicting individual patient response to systemic therapy and helps lay the foundation for further investigation into the calibration of mathematical models of cancer with patient-tissue derived molecular data. Chapters 6 and 8 were published in the Annals of Biomedical Engineering. Chapters 2, 3, and 7 were published in Metabolomics, Lung Cancer, and Pharmaceutical Research, respectively. Chapters 4 has been accepted for publication at the journal Metabolomics (in press) and Chapter 5 is in review at the journal Metabolomics. Chapter 9 is currently undergoing preparation for submission

    Ono: an open platform for social robotics

    Get PDF
    In recent times, the focal point of research in robotics has shifted from industrial ro- bots toward robots that interact with humans in an intuitive and safe manner. This evolution has resulted in the subfield of social robotics, which pertains to robots that function in a human environment and that can communicate with humans in an int- uitive way, e.g. with facial expressions. Social robots have the potential to impact many different aspects of our lives, but one particularly promising application is the use of robots in therapy, such as the treatment of children with autism. Unfortunately, many of the existing social robots are neither suited for practical use in therapy nor for large scale studies, mainly because they are expensive, one-of-a-kind robots that are hard to modify to suit a specific need. We created Ono, a social robotics platform, to tackle these issues. Ono is composed entirely from off-the-shelf components and cheap materials, and can be built at a local FabLab at the fraction of the cost of other robots. Ono is also entirely open source and the modular design further encourages modification and reuse of parts of the platform

    Drug-drug interactions: A machine learning approach

    Get PDF
    Automatic detection of drug-drug interaction (DDI) is a difficult problem in pharmaco-surveillance. Recent practice for in vitro and in vivo pharmacokinetic drug-drug interaction studies have been based on carefully selected drug characteristics such as their pharmacological effects, and on drug-target networks, in order to identify and comprehend anomalies in a drug\u27s biochemical function upon co-administration.;In this work, we present a novel DDI prediction framework that combines several drug-attribute similarity measures to construct a feature space from which we train three machine learning algorithms: Support Vector Machine (SVM), J48 Decision Tree and K-Nearest Neighbor (KNN) using a partially supervised classification algorithm called Positive Unlabeled Learning (PU-Learning) tailored specifically to suit our framework.;In summary, we extracted 1,300 U.S. Food and Drug Administration-approved pharmaceutical drugs and paired them to create 1,688,700 feature vectors. Out of 397 drug-pairs known to interact prior to our experiments, our system was able to correctly identify 80% of them and from the remaining 1,688,303 pairs for which no interaction had been determined, we were able to predict 181 potential DDIs with confidence levels greater than 97%. The latter is a set of DDIs unrecognized by our source of ground truth at the time of study.;Evaluation of the effectiveness of our system involved querying the U.S. Food and Drug Administration\u27s Adverse Effect Reporting System (AERS) database for cases involving drug-pairs used in this study. The results returned from the query listed incidents reported for a number of patients, some of whom had experienced severe adverse reactions leading to outcomes such as prolonged hospitalization, diminished medicinal effect of one or more drugs, and in some cases, death
    • …
    corecore