3,123 research outputs found

    Using Similarity Metrics on Real World Data and Patient Treatment Pathways to Recommend the Next Treatment

    Get PDF
    Non-small-cell lung cancer (NSCLC) is one of the most prevalent types of lung cancer and continues to have an ominous five year survival rate. Considerable work has been accomplished in analyzing the viability of the treatments offered to NSCLC patients; however, while many of these treatments have performed better over populations of diagnosed NSCLC patients, a specific treatment may not be the most effective therapy for a given patient. Coupling both patient similarity metrics using the Gower similarity metric and prior treatment knowledge, we were able to demonstrate how patient analytics can complement clinical efforts in recommending the next best treatment. Our retrospective and exploratory results indicate that a majority of patients are not recommended the best surviving therapy once they require a new therapy. This investigation lays the groundwork for treatment recommendation using analytics, but more investigation is required to analyze patient outcomes beyond survival

    Rule Mining and Sequential Pattern Based Predictive Modeling with EMR Data

    Get PDF
    Electronic medical record (EMR) data is collected on a daily basis at hospitals and other healthcare facilities to track patients’ health situations including conditions, treatments (medications, procedures), diagnostics (labs) and associated healthcare operations. Besides being useful for individual patient care and hospital operations (e.g., billing, triaging), EMRs can also be exploited for secondary data analyses to glean discriminative patterns that hold across patient cohorts for different phenotypes. These patterns in turn can yield high level insights into disease progression with interventional potential. In this dissertation, using a large scale realistic EMR dataset of over one million patients visiting University of Kentucky healthcare facilities, we explore data mining and machine learning methods for association rule (AR) mining and predictive modeling with mood and anxiety disorders as use-cases. Our first work involves analysis of existing quantitative measures of rule interestingness to assess how they align with a practicing psychiatrist’s sense of novelty/surprise corresponding to ARs identified from EMRs. Our second effort involves mining causal ARs with depression and anxiety disorders as target conditions through matching methods accounting for computationally identified confounding attributes. Our final effort involves efficient implementation (via GPUs) and application of contrast pattern mining to predictive modeling for mental conditions using various representational methods and recurrent neural networks. Overall, we demonstrate the effectiveness of rule mining methods in secondary analyses of EMR data for identifying causal associations and building predictive models for diseases

    Multi-stream Longitudinal Data Analysis using Deep Learning

    Get PDF
    Longitudinal healthcare data encompasses all tasks where patients information are collected at multiple follow-up times. Analyzing this data is critical in addressing many real world problems in healthcare such as disease prediction and prevention. In this thesis, technical challenges in analyzing longitudinal administrative claims data are addressed and novel deep learning based models are proposed for multi-stream data analysis and disease prediction tasks. These algorithms and frameworks are assessed mainly on substance use disorders prediction tasks and specifically designed to tackled these disorders. Substance use disorder is a public health crisis costing the US an estimated $740 billion annually in healthcare, lost workplace productivity, and crime. Early identification and engagement of individuals at risk of developing a substance use disorder is a critical unmet need in healthcare which can be achieved by producing automatic artificial intelligence based tools trained using big healthcare data. In fact, healthcare data can be harnessed together with artificial intelligence and machine learning to advance our understanding of factors that increase the propensity for developing different diseases as well as those that aid in the treatment of these disorders. Here in, a disease prediction framework is first proposed based on recurrent neural networks. This framework includes three components: 1) data pre-processing, 2) disease prediction using long short term memory models, and 3) hypothesis exploration by varying the models and the inputs. This framework is assessed using two use cases: substance use disorder prediction and mild cognitive impairment prediction. Experimental results show that this proposed model can efficiently analyze patients\u27 data and creates efficient disease prediction tools. Second, the limitationsof current deep learning models including long short term memory models in claimsdata analysis are detected and addressed, and a novel model based on the transformer models is proposed. In fact, leveraging the real-world longitudinal claims data, a novel multi-stream transformer model is proposed for predicting opioid use disorder as an important case of substance use disorders. This model is designed to simultaneously analyze multiple types of data streams, such as medications, diagnoses, procedures and demographics, by attending to segments within and across these data streams. The proposed model tested on the IBM MarketScan data showed significantly better performance than the traditional models and recently developed deep learning models

    Deep learning for precision medicine

    Get PDF
    As a result of the recent trend towards digitization, an increasing amount of information is recorded in clinics and hospitals, and this increasingly overwhelms the human decision maker. This issue is one of the main reasons why Machine Learning (ML) is gaining attention in the medical domain, since ML algorithms can make use of all the available information to predict the most likely future events that will occur to each individual patient. Physicians can include these predictions in their decision processes which can lead to improved outcomes. Eventually ML can also be the basis for a decision support system that provides personalized recommendations for each individual patient. It is also worth noticing that medical datasets are becoming both longer (i.e. we have more samples collected through time) and wider (i.e. we store more variables). There- fore we need to use ML algorithms capable of modelling complex relationships among a big number of time-evolving variables. A kind of models that can capture very complex relationships are Deep Neural Networks, which have proven to be successful in other areas of ML, like for example Language Modelling, which is a use case that has some some similarities with the medical use case. However, the medical domain has a set of characteristics that make it an almost unique scenario: multiple events can occur at the same time, there are multiple sequences (i.e. multiple patients), each sequence has an associated set of static variables, both inputs and outputs can be a combination of different data types, etc. For these reasons we need to develop approaches specifically designed for the medical use case. In this work we design and develop different kind of models based on Neural Networks that are suitable for modelling medical datasets. Besides, we tackle different medical tasks and datasets, showing which models work best in each case. The first dataset we use is one collected from patients that suffered from kidney failure. The data was collected in the CharitĂ© hospital in Berlin and it is the largest data collection of its kind in Europe. Once the kidney has failed, patients face a lifelong treatment and periodic visits to the clinic for the rest of their lives. Until the hospital finds a new kidney for the patient, he or she must attend to the clinic multiple times per week in order to receive dialysis, which is a treatment that replaces many of the functions of the kidney. After the transplant has been performed, the patient receives immunosuppressive therapy to avoid the rejection of the transplanted kidney. Patients must be periodically controlled to check the status of the kidney, adjust the treatment and take care of associated diseases, such as those that arise due to the immunosuppressive therapy. This dataset started being recorded more than 30 years ago and it is composed of more than 4000 patients that underwent a renal transplantation or are waiting for it. The database has been the basis for many studies in the past. Our first goal with the nephrology dataset is to develop a system to predict the next events that will be recorded in the electronic medical record of each patient, and thus to develop the basis for a future clinical decision support system. Specifically, we model three aspects of the patient evolution: medication prescriptions, laboratory tests ordered and laboratory test results. Besides, there are a set of endpoints that can happen after a transplantation and it would be very valuable for the physicians to be able to know beforehand when one of these is going to happen. Specifically, we also predict whether the patient will die, the transplant will be rejected, or the transplant will be lost. For each visit that a patient makes to the clinic, we anticipate which of those three events (if any) will occur both within 6 months and 12 months after the visit. The second dataset that we use in this thesis is the one collected by the MEmind Wellness Tracker, which contains information related to psychiatric patients. Suicide is the second leading cause of death in the 15-29 years age group, and its prevention is one of the top public health priorities. Traditionally, psychiatric patients have been assessed by self-reports, but these su↔er from recall bias. To improve data quantity and quality, the MEmind Wellness Tracker provides a mobile application that enables patients to send daily reports about their status. Thus, this application enables physicians to get information about patients in their natural environments. Therefore this dataset contains sequential information generated by the MEmind application, sequential information generated during medical visits and static information of each patient. Our goal with this dataset is to predict the suicidal ideation value that each patient will report next. In order to model both datasets, we have developed a set of predictive Machine Learning models based on Neural Networks capable of integrating multiple sequences of data withthe background information of each patient. We compare the performance achieved by these approaches with the ones obtained with classical ML algorithms. For the task of predicting the next events that will be observed in the nephrology dataset, we obtained the best performance with a Feedforward Neural Network containing a representation layer. On the other hand, for the tasks of endpoint prediction in nephrology patients and the task of suicidal ideation prediction, we obtained the best performance with a model that combines a Feedforward Neural Network with one or multiple Recurrent Neural Networks (RNNs) using Gated Recurrent Units. We hypothesize that this kind of models that include RNNs provide the best performance when the dataset contains long-term dependencies. To our knowledge, our work is the first one that develops these kind of deep networks that combine both static and several sources of dynamic information. These models can be useful in many other medical datasets and even in datasets within other domains. We show some examples where our approach is successfully applied to non-medical datasets that also present multiple variables evolving in time. Besides, we installed the endpoints prediction model as a standalone system in the Charit ́e hospital in Berlin. For this purpose, we developed a web based user interface that the physicians can use, and an API interface that can be used to connect our predictive system with other IT systems in the hospital. These systems can be seen as a recommender system, however they do not necessarily generate valid prescriptions. For example, for certain patient, a system can predict very high probabilities for all antibiotics in the dataset. Obviously, this patient should not take all antibiotics, but only one of them. Therefore, we need a human decision maker on top of our recommender system. In order to model this decision process, we used an architecture based on a Generative Adversarial Network (GAN). GANs are systems based on Neural Networks that make better generative models than regular Neural Networks. Thus we trained one GAN that works on top of a regular Neural Network and show how the quality of the prescriptions gets improved. We run this experiment with a synthetic dataset that we created for this purpose. The architectures that we developed, are specially designed for modelling medical data, but they can be also useful in other use cases. We run experiments showing how we train them for modelling the readings of a sensor network and also to train a movie recommendation engine

    CAPTURE AND ANALYSIS OF SENSOR DATA FOR ASTHMA PATIENTS

    Get PDF
    Worldwide more than 230 million people suffer from asthma. Reliable and timely guidance for indi-viduals to minimize their risk for asthma attacks is not available. This is largely due to the fact that asthma symptoms are often caused by multiple environmental and personal factors. Many of them are neither captured nor systematically analysed. This is addressed by the project ActOnAir. It aims at a comprehensive capture of health factors and the environmental exposure of individuals, as well as a subsequent analysis in real-time. For this purpose the ActOnAir system provides a mobile sensor box for data collection, a sensor data integration and processing platform, a data mining component and a smartphone application for patients. This contribution outlines the design objectives of the ActOnAir system and discusses corresponding key requirements. The related system architecture is introduced and first results from a prototype implementation are sketched

    Using Big Data Analytics and Statistical Methods for Improving Drug Safety

    Get PDF
    This dissertation includes three studies, all focusing on utilizing Big Data and statistical methods for improving one of the most important aspects of health care, namely drug safety. In these studies we develop data analytics methodologies to inspect, clean, and model data with the aim of fulfilling the three main goals of drug safety; detection, understanding, and prediction of adverse drug effects.In the first study, we develop a methodology by combining both analytics and statistical methods with the aim of detecting associations between drugs and adverse events through historical patients' records. Particularly we show applicability of the developed methodology by focusing on investigating potential confounding role of common diabetes drugs on developing acute renal failure in diabetic patients. While traditional methods of signal detection mostly consider one drug and one adverse event at a time for investigation, our proposed methodology takes into account the effect of drug-drug interactions by identifying groups of drugs frequently prescribed together.In the second study, two independent methodologies are developed to investigate the role of prescription sequence factor on the likelihood of developing adverse events. In fact, this study focuses on using data analytics for understanding drug-event associations. Our analyses on the historical medication records of a group of diabetic patients using the proposed approaches revealed that the sequence in which the drugs are prescribed, and administered, significantly do matter in the development of adverse events associated with those drugs.The third study uses a chronological approach to develop a network of approved drugs and their known adverse events. It then utilizes a set of network metrics, both similarity- and centrality-based, to build and train machine learning predictive models and predict the likely adverse events for the newly discovered drugs before their approval and introduction to the market. For this purpose, data of known drug-event associations from a large biomedical publication database (i.e., PubMed) is employed to construct the network. The results indicate significant improvements in terms of accuracy of prediction of drug-evet associations compared with similar approaches
    • 

    corecore