17,463 research outputs found

    Semantically Enhanced Dynamic Bayesian Network for Detecting Sepsis Mortality Risk in ICU Patients with Infection

    Full text link
    Although timely sepsis diagnosis and prompt interventions in Intensive Care Unit (ICU) patients are associated with reduced mortality, early clinical recognition is frequently impeded by non-specific signs of infection and failure to detect signs of sepsis-induced organ dysfunction in a constellation of dynamically changing physiological data. The goal of this work is to identify patient at risk of life-threatening sepsis utilizing a data-centered and machine learning-driven approach. We derive a mortality risk predictive dynamic Bayesian network (DBN) guided by a customized sepsis knowledgebase and compare the predictive accuracy of the derived DBN with the Sepsis-related Organ Failure Assessment (SOFA) score, the Quick SOFA (qSOFA) score, the Simplified Acute Physiological Score (SAPS-II) and the Modified Early Warning Score (MEWS) tools. A customized sepsis ontology was used to derive the DBN node structure and semantically characterize temporal features derived from both structured physiological data and unstructured clinical notes. We assessed the performance in predicting mortality risk of the DBN predictive model and compared performance to other models using Receiver Operating Characteristic (ROC) curves, area under curve (AUROC), calibration curves, and risk distributions. The derived dataset consists of 24,506 ICU stays from 19,623 patients with evidence of suspected infection, with 2,829 patients deceased at discharge. The DBN AUROC was found to be 0.91, which outperformed the SOFA (0.843), qSOFA (0.66), MEWS (0.73), and SAPS-II (0.77) scoring tools. Continuous Net Reclassification Index and Integrated Discrimination Improvement analysis supported the superiority DBN. Compared with conventional rule-based risk scoring tools, the sepsis knowledgebase-driven DBN algorithm offers improved performance for predicting mortality of infected patients in ICUs

    EigenEvent: An Algorithm for Event Detection from Complex Data Streams in Syndromic Surveillance

    Full text link
    Syndromic surveillance systems continuously monitor multiple pre-diagnostic daily streams of indicators from different regions with the aim of early detection of disease outbreaks. The main objective of these systems is to detect outbreaks hours or days before the clinical and laboratory confirmation. The type of data that is being generated via these systems is usually multivariate and seasonal with spatial and temporal dimensions. The algorithm What's Strange About Recent Events (WSARE) is the state-of-the-art method for such problems. It exhaustively searches for contrast sets in the multivariate data and signals an alarm when find statistically significant rules. This bottom-up approach presents a much lower detection delay comparing the existing top-down approaches. However, WSARE is very sensitive to the small-scale changes and subsequently comes with a relatively high rate of false alarms. We propose a new approach called EigenEvent that is neither fully top-down nor bottom-up. In this method, we instead of top-down or bottom-up search, track changes in data correlation structure via eigenspace techniques. This new methodology enables us to detect both overall changes (via eigenvalue) and dimension-level changes (via eigenvectors). Experimental results on hundred sets of benchmark data reveals that EigenEvent presents a better overall performance comparing state-of-the-art, in particular in terms of the false alarm rate.Comment: To appear in Intelligent Data Analysis Journal, vol. 19(3), 201

    EMR-based medical knowledge representation and inference via Markov random fields and distributed representation learning

    Full text link
    Objective: Electronic medical records (EMRs) contain an amount of medical knowledge which can be used for clinical decision support (CDS). Our objective is a general system that can extract and represent these knowledge contained in EMRs to support three CDS tasks: test recommendation, initial diagnosis, and treatment plan recommendation, with the given condition of one patient. Methods: We extracted four kinds of medical entities from records and constructed an EMR-based medical knowledge network (EMKN), in which nodes are entities and edges reflect their co-occurrence in a single record. Three bipartite subgraphs (bi-graphs) were extracted from the EMKN to support each task. One part of the bi-graph was the given condition (e.g., symptoms), and the other was the condition to be inferred (e.g., diseases). Each bi-graph was regarded as a Markov random field to support the inference. Three lazy energy functions and one parameter-based energy function were proposed, as well as two knowledge representation learning-based energy functions, which can provide a distributed representation of medical entities. Three measures were utilized for performance evaluation. Results: On the initial diagnosis task, 80.11% of the test records identified at least one correct disease from top 10 candidates. Test and treatment recommendation results were 87.88% and 92.55%, respectively. These results altogether indicate that the proposed system outperformed the baseline methods. The distributed representation of medical entities does reflect similarity relationships in regards to knowledge level. Conclusion: Combining EMKN and MRF is an effective approach for general medical knowledge representation and inference. Different tasks, however, require designing their energy functions individually

    A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop

    Full text link
    The goal of Machine Learning to automatically learn from data, extract knowledge and to make decisions without any human intervention. Such automatic (aML) approaches show impressive success. Recent results even demonstrate intriguingly that deep learning applied for automatic classification of skin lesions is on par with the performance of dermatologists, yet outperforms the average. As human perception is inherently limited, such approaches can discover patterns, e.g. that two objects are similar, in arbitrarily high-dimensional spaces what no human is able to do. Humans can deal only with limited amounts of data, whilst big data is beneficial for aML; however, in health informatics, we are often confronted with a small number of data sets, where aML suffer of insufficient training samples and many problems are computationally hard. Here, interactive machine learning (iML) may be of help, where a human-in-the-loop contributes to reduce the complexity of NP-hard problems. A further motivation for iML is that standard black-box approaches lack transparency, hence do not foster trust and acceptance of ML among end-users. Rising legal and privacy aspects, e.g. with the new European General Data Protection Regulations, make black-box approaches difficult to use, because they often are not able to explain why a decision has been made. In this paper, we present some experiments to demonstrate the effectiveness of the human-in-the-loop approach, particularly in opening the black-box to a glass-box and thus enabling a human directly to interact with an learning algorithm. We selected the Ant Colony Optimization framework, and applied it on the Traveling Salesman Problem, which is a good example, due to its relevance for health informatics, e.g. for the study of protein folding. From studies of how humans extract so much from so little data, fundamental ML-research also may benefit.Comment: 26 pages, 5 figure

    Early Stage Influenza Detection from Twitter

    Full text link
    Influenza is an acute respiratory illness that occurs virtually every year and results in substantial disease, death and expense. Detection of Influenza in its earliest stage would facilitate timely action that could reduce the spread of the illness. Existing systems such as CDC and EISS which try to collect diagnosis data, are almost entirely manual, resulting in about two-week delays for clinical data acquisition. Twitter, a popular microblogging service, provides us with a perfect source for early-stage flu detection due to its real- time nature. For example, when a flu breaks out, people that get the flu may post related tweets which enables the detection of the flu breakout promptly. In this paper, we investigate the real-time flu detection problem on Twitter data by proposing Flu Markov Network (Flu-MN): a spatio-temporal unsupervised Bayesian algorithm based on a 4 phase Markov Network, trying to identify the flu breakout at the earliest stage. We test our model on real Twitter datasets from the United States along with baselines in multiple applications, such as real-time flu breakout detection, future epidemic phase prediction, or Influenza-like illness (ILI) physician visits. Experimental results show the robustness and effectiveness of our approach. We build up a real time flu reporting system based on the proposed approach, and we are hopeful that it would help government or health organizations in identifying flu outbreaks and facilitating timely actions to decrease unnecessary mortality

    FWDA: a Fast Wishart Discriminant Analysis with its Application to Electronic Health Records Data Classification

    Full text link
    Linear Discriminant Analysis (LDA) on Electronic Health Records (EHR) data is widely-used for early detection of diseases. Classical LDA for EHR data classification, however, suffers from two handicaps: the ill-posed estimation of LDA parameters (e.g., covariance matrix), and the "linear inseparability" of EHR data. To handle these two issues, in this paper, we propose a novel classifier FWDA -- Fast Wishart Discriminant Analysis, that makes predictions in an ensemble way. Specifically, FWDA first surrogates the distribution of inverse covariance matrices using a Wishart distribution estimated from the training data, then "weighted-averages" the classification results of multiple LDA classifiers parameterized by the sampled inverse covariance matrices via a Bayesian Voting scheme. The weights for voting are optimally updated to adapt each new input data, so as to enable the nonlinear classification. Theoretical analysis indicates that FWDA possesses a fast convergence rate and a robust performance on high dimensional data. Extensive experiments on large-scale EHR dataset show that our approach outperforms state-of-the-art algorithms by a large margin

    Time Series Imputation

    Full text link
    Multivariate time series is a very active topic in the research community and many machine learning tasks are being used in order to extract information from this type of data. However, in real-world problems data has missing values, which may difficult the application of machine learning techniques to extract information. In this paper we focus on the task of imputation of time series. Many imputation methods for time series are based on regression methods. Unfortunately, these methods perform poorly when the variables are categorical. To address this case, we propose a new imputation method based on Expectation Maximization over dynamic Bayesian networks. The approach is assessed with synthetic and real data, and it outperforms several state-of-the art methods.Comment: Master paper, draft to be submitte

    Evaluation of Predictive Data Mining Algorithms in Erythemato-Squamous Disease Diagnosis

    Full text link
    A lot of time is spent searching for the most performing data mining algorithms applied in clinical diagnosis. The study set out to identify the most performing predictive data mining algorithms applied in the diagnosis of Erythemato-squamous diseases. The study used Naive Bayes, Multilayer Perceptron and J48 decision tree induction to build predictive data mining models on 366 instances of Erythemato-squamous diseases datasets. Also, 10-fold cross-validation and sets of performance metrics were used to evaluate the baseline predictive performance of the classifiers. The comparative analysis shows that the Naive Bayes performed best with accuracy of 97.4%, Multilayer Perceptron came out second with accuracy of 96.6%, and J48 came out the worst with accuracy of 93.5%. The evaluation of these classifiers on clinical datasets, gave an insight into the predictive ability of different data mining algorithms applicable in clinical diagnosis especially in the diagnosis of Erythemato-squamous diseases.Comment: 10 pages, 3 figures 2 table

    A New Approach to Adaptive Signal Processing

    Full text link
    A unified linear algebraic approach to adaptive signal processing (ASP) is presented. Starting from just Ax=b, key ASP algorithms are derived in a simple, systematic, and integrated manner without requiring any background knowledge to the field. Algorithms covered are Steepest Descent, LMS, Normalized LMS, Kaczmarz, Affine Projection, RLS, Kalman filter, and MMSE/Least Square Wiener filters. By following this approach, readers will discover a synthesis; they will learn that one and only one equation is involved in all these algorithms. They will also learn that this one equation forms the basis of more advanced algorithms like reduced rank adaptive filters, extended Kalman filter, particle filters, multigrid methods, preconditioning methods, Krylov subspace methods and conjugate gradients. This will enable them to enter many sophisticated realms of modern research and development. Eventually, this one equation will not only become their passport to ASP but also to many highly specialized areas of computational science and engineering

    Simulation-Based Inference for Global Health Decisions

    Full text link
    The COVID-19 pandemic has highlighted the importance of in-silico epidemiological modelling in predicting the dynamics of infectious diseases to inform health policy and decision makers about suitable prevention and containment strategies. Work in this setting involves solving challenging inference and control problems in individual-based models of ever increasing complexity. Here we discuss recent breakthroughs in machine learning, specifically in simulation-based inference, and explore its potential as a novel venue for model calibration to support the design and evaluation of public health interventions. To further stimulate research, we are developing software interfaces that turn two cornerstone COVID-19 and malaria epidemiology models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria (https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling efficient interpretable Bayesian inference within those simulators
    • …
    corecore