21 research outputs found

    Deep learning for precision medicine

    Get PDF
    As a result of the recent trend towards digitization, an increasing amount of information is recorded in clinics and hospitals, and this increasingly overwhelms the human decision maker. This issue is one of the main reasons why Machine Learning (ML) is gaining attention in the medical domain, since ML algorithms can make use of all the available information to predict the most likely future events that will occur to each individual patient. Physicians can include these predictions in their decision processes which can lead to improved outcomes. Eventually ML can also be the basis for a decision support system that provides personalized recommendations for each individual patient. It is also worth noticing that medical datasets are becoming both longer (i.e. we have more samples collected through time) and wider (i.e. we store more variables). There- fore we need to use ML algorithms capable of modelling complex relationships among a big number of time-evolving variables. A kind of models that can capture very complex relationships are Deep Neural Networks, which have proven to be successful in other areas of ML, like for example Language Modelling, which is a use case that has some some similarities with the medical use case. However, the medical domain has a set of characteristics that make it an almost unique scenario: multiple events can occur at the same time, there are multiple sequences (i.e. multiple patients), each sequence has an associated set of static variables, both inputs and outputs can be a combination of different data types, etc. For these reasons we need to develop approaches specifically designed for the medical use case. In this work we design and develop different kind of models based on Neural Networks that are suitable for modelling medical datasets. Besides, we tackle different medical tasks and datasets, showing which models work best in each case. The first dataset we use is one collected from patients that suffered from kidney failure. The data was collected in the CharitĂ© hospital in Berlin and it is the largest data collection of its kind in Europe. Once the kidney has failed, patients face a lifelong treatment and periodic visits to the clinic for the rest of their lives. Until the hospital finds a new kidney for the patient, he or she must attend to the clinic multiple times per week in order to receive dialysis, which is a treatment that replaces many of the functions of the kidney. After the transplant has been performed, the patient receives immunosuppressive therapy to avoid the rejection of the transplanted kidney. Patients must be periodically controlled to check the status of the kidney, adjust the treatment and take care of associated diseases, such as those that arise due to the immunosuppressive therapy. This dataset started being recorded more than 30 years ago and it is composed of more than 4000 patients that underwent a renal transplantation or are waiting for it. The database has been the basis for many studies in the past. Our first goal with the nephrology dataset is to develop a system to predict the next events that will be recorded in the electronic medical record of each patient, and thus to develop the basis for a future clinical decision support system. Specifically, we model three aspects of the patient evolution: medication prescriptions, laboratory tests ordered and laboratory test results. Besides, there are a set of endpoints that can happen after a transplantation and it would be very valuable for the physicians to be able to know beforehand when one of these is going to happen. Specifically, we also predict whether the patient will die, the transplant will be rejected, or the transplant will be lost. For each visit that a patient makes to the clinic, we anticipate which of those three events (if any) will occur both within 6 months and 12 months after the visit. The second dataset that we use in this thesis is the one collected by the MEmind Wellness Tracker, which contains information related to psychiatric patients. Suicide is the second leading cause of death in the 15-29 years age group, and its prevention is one of the top public health priorities. Traditionally, psychiatric patients have been assessed by self-reports, but these su↔er from recall bias. To improve data quantity and quality, the MEmind Wellness Tracker provides a mobile application that enables patients to send daily reports about their status. Thus, this application enables physicians to get information about patients in their natural environments. Therefore this dataset contains sequential information generated by the MEmind application, sequential information generated during medical visits and static information of each patient. Our goal with this dataset is to predict the suicidal ideation value that each patient will report next. In order to model both datasets, we have developed a set of predictive Machine Learning models based on Neural Networks capable of integrating multiple sequences of data withthe background information of each patient. We compare the performance achieved by these approaches with the ones obtained with classical ML algorithms. For the task of predicting the next events that will be observed in the nephrology dataset, we obtained the best performance with a Feedforward Neural Network containing a representation layer. On the other hand, for the tasks of endpoint prediction in nephrology patients and the task of suicidal ideation prediction, we obtained the best performance with a model that combines a Feedforward Neural Network with one or multiple Recurrent Neural Networks (RNNs) using Gated Recurrent Units. We hypothesize that this kind of models that include RNNs provide the best performance when the dataset contains long-term dependencies. To our knowledge, our work is the first one that develops these kind of deep networks that combine both static and several sources of dynamic information. These models can be useful in many other medical datasets and even in datasets within other domains. We show some examples where our approach is successfully applied to non-medical datasets that also present multiple variables evolving in time. Besides, we installed the endpoints prediction model as a standalone system in the Charit ́e hospital in Berlin. For this purpose, we developed a web based user interface that the physicians can use, and an API interface that can be used to connect our predictive system with other IT systems in the hospital. These systems can be seen as a recommender system, however they do not necessarily generate valid prescriptions. For example, for certain patient, a system can predict very high probabilities for all antibiotics in the dataset. Obviously, this patient should not take all antibiotics, but only one of them. Therefore, we need a human decision maker on top of our recommender system. In order to model this decision process, we used an architecture based on a Generative Adversarial Network (GAN). GANs are systems based on Neural Networks that make better generative models than regular Neural Networks. Thus we trained one GAN that works on top of a regular Neural Network and show how the quality of the prescriptions gets improved. We run this experiment with a synthetic dataset that we created for this purpose. The architectures that we developed, are specially designed for modelling medical data, but they can be also useful in other use cases. We run experiments showing how we train them for modelling the readings of a sensor network and also to train a movie recommendation engine

    Predictive analytics framework for electronic health records with machine learning advancements : optimising hospital resources utilisation with predictive and epidemiological models

    Get PDF
    The primary aim of this thesis was to investigate the feasibility and robustness of predictive machine-learning models in the context of improving hospital resources’ utilisation with data- driven approaches and predicting hospitalisation with hospital quality assessment metrics such as length of stay. The length of stay predictions includes the validity of the proposed methodological predictive framework on each hospital’s electronic health records data source. In this thesis, we relied on electronic health records (EHRs) to drive a data-driven predictive inpatient length of stay (LOS) research framework that suits the most demanding hospital facilities for hospital resources’ utilisation context. The thesis focused on the viability of the methodological predictive length of stay approaches on dynamic and demanding healthcare facilities and hospital settings such as the intensive care units and the emergency departments. While the hospital length of stay predictions are (internal) healthcare inpatients outcomes assessment at the time of admission to discharge, the thesis also considered (external) factors outside hospital control, such as forecasting future hospitalisations from the spread of infectious communicable disease during pandemics. The internal and external splits are the thesis’ main contributions. Therefore, the thesis evaluated the public health measures during events of uncertainty (e.g. pandemics) and measured the effect of non-pharmaceutical intervention during outbreaks on future hospitalised cases. This approach is the first contribution in the literature to examine the epidemiological curves’ effect using simulation models to project the future hospitalisations on their strong potential to impact hospital beds’ availability and stress hospital workflow and workers, to the best of our knowledge. The main research commonalities between chapters are the usefulness of ensembles learning models in the context of LOS for hospital resources utilisation. The ensembles learning models anticipate better predictive performance by combining several base models to produce an optimal predictive model. These predictive models explored the internal LOS for various chronic and acute conditions using data-driven approaches to determine the most accurate and powerful predicted outcomes. This eventually helps to achieve desired outcomes for hospital professionals who are working in hospital settings

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Language modelling for clinical natural language understanding and generation

    Get PDF
    One of the long-standing objectives of Artificial Intelligence (AI) is to design and develop algorithms for social good including tackling public health challenges. In the era of digitisation, with an unprecedented amount of healthcare data being captured in digital form, the analysis of the healthcare data at scale can lead to better research of diseases, better monitoring patient conditions and more importantly improving patient outcomes. However, many AI-based analytic algorithms rely solely on structured healthcare data such as bedside measurements and test results which only account for 20% of all healthcare data, whereas the remaining 80% of healthcare data is unstructured including textual data such as clinical notes and discharge summaries which is still underexplored. Conventional Natural Language Processing (NLP) algorithms that are designed for clinical applications rely on the shallow matching, templates and non-contextualised word embeddings which lead to limited understanding of contextual semantics. Though recent advances in NLP algorithms have demonstrated promising performance on a variety of NLP tasks in the general domain with contextualised language models, most of these generic NLP algorithms struggle at specific clinical NLP tasks which require biomedical knowledge and reasoning. Besides, there is limited research to study generative NLP algorithms to generate clinical reports and summaries automatically by considering salient clinical information. This thesis aims to design and develop novel NLP algorithms especially clinical-driven contextualised language models to understand textual healthcare data and generate clinical narratives which can potentially support clinicians, medical scientists and patients. The first contribution of this thesis focuses on capturing phenotypic information of patients from clinical notes which is important to profile patient situation and improve patient outcomes. The thesis proposes a novel self-supervised language model, named Phenotypic Intelligence Extraction (PIE), to annotate phenotypes from clinical notes with the detection of contextual synonyms and the enhancement to reason with numerical values. The second contribution is to demonstrate the utility and benefits of using phenotypic features of patients in clinical use cases by predicting patient outcomes in Intensive Care Units (ICU) and identifying patients at risk of specific diseases with better accuracy and model interpretability. The third contribution is to propose generative models to generate clinical narratives to automate and accelerate the process of report writing and summarisation by clinicians. This thesis first proposes a novel summarisation language model named PEGASUS which surpasses or is on par with the state-of-the-art performance on 12 downstream datasets including biomedical literature from PubMed. PEGASUS is further extended to generate medical scientific documents from input tabular data.Open Acces

    Hierarchical, informed and robust machine learning for surgical tool management

    Get PDF
    This thesis focuses on the development of a computer vision and deep learning based system for the intelligent management of surgical tools. The work accomplished included the development of a new dataset, creation of state of the art techniques to cope with volume, variety and vision problems, and designing or adapting algorithms to address specific surgical tool recognition issues. The system was trained to cope with a wide variety of tools, with very subtle differences in shapes, and was designed to work with high volumes, as well as varying illuminations and backgrounds. Methodology that was adopted in this thesis included the creation of a surgical tool image dataset and development of a surgical tool attribute matrix or knowledge-base. This was significant because there are no large scale publicly available surgical tool datasets, nor are there established annotations or datasets of textual descriptions of surgical tools that can be used for machine learning. The work resulted in the development of a new hierarchical architecture for multi-level predictions at surgical speciality, pack, set and tool level. Additional work evaluated the use of synthetic data to improve robustness of the CNN, and the infusion of knowledge to improve predictive performance

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field
    corecore