2,546 research outputs found

    Combining Free Text and Structured Electronic Medical Record Entries to Detect Acute Respiratory Infections

    Get PDF
    The electronic medical record (EMR) contains a rich source of information that could be harnessed for epidemic surveillance. We asked if structured EMR data could be coupled with computerized processing of free-text clinical entries to enhance detection of acute respiratory infections (ARI).A manual review of EMR records related to 15,377 outpatient visits uncovered 280 reference cases of ARI. We used logistic regression with backward elimination to determine which among candidate structured EMR parameters (diagnostic codes, vital signs and orders for tests, imaging and medications) contributed to the detection of those reference cases. We also developed a computerized free-text search to identify clinical notes documenting at least two non-negated ARI symptoms. We then used heuristics to build case-detection algorithms that best combined the retained structured EMR parameters with the results of the text analysis.An adjusted grouping of diagnostic codes identified reference ARI patients with a sensitivity of 79%, a specificity of 96% and a positive predictive value (PPV) of 32%. Of the 21 additional structured clinical parameters considered, two contributed significantly to ARI detection: new prescriptions for cough remedies and elevations in body temperature to at least 38°C. Together with the diagnostic codes, these parameters increased detection sensitivity to 87%, but specificity and PPV declined to 95% and 25%, respectively. Adding text analysis increased sensitivity to 99%, but PPV dropped further to 14%. Algorithms that required satisfying both a query of structured EMR parameters as well as text analysis disclosed PPVs of 52-68% and retained sensitivities of 69-73%.Structured EMR parameters and free-text analyses can be combined into algorithms that can detect ARI cases with new levels of sensitivity or precision. These results highlight potential paths by which repurposed EMR information could facilitate the discovery of epidemics before they cause mass casualties

    UNION INTERSECTION TEST IN INTERPRETING SIGNAL FROM MULTIVARIATE CONTROL CHART

    Get PDF
    Statistical Process Control (SPC) has been a very important discipline in quality control study since pioneered by Walter A. Shewhart in 1920s. Control charting is one of the important tools in SPC and has received wide attention from researchers as well as practitioners. The complexity and the impracticality in monitoring several univariate control charts for a multivariate process has made many practitioners use a multivariate control chart instead. Its usage gives a better control of the overall Type I error and the interdependency among variables is retained. Unfortunately, a multivariate control chart is not able to pinpoint the responsible variable(s) once an out-of-control (OOC) signal is triggered. Many diagnostic methods have been proposed to overcome this problem but all of them have their own limitations and drawbacks. The applicability of a diagnostic method for a limited number of variables, lack of physical interpretation, the complexity of the computation procedure and lack of location invariance are among the factors that have inhibited the implementation of multivariate charts. Lack of comparative studies for various diagnostic methods also makes it difficult for practitioners to choose an appropriate diagnostic method. This study highlights some problems that might arise in a comparison of diagnostic methods and makes suggestions to overcome them, hence, making the results of a comparative study more relevant and reliable. The effects of several factors such as the size of the deviation in a mean vector, the combination of various sizes of shifts in a mean vector and the inter-correlation among the variables on the performance of diagnostic methods are studied and a summary of the suitability of certain diagnostic methods for certain situations is given. This study presents a new comparison involving two diagnostic methods adapted from the methods proposed by Doganaksoy, Faltin and Tucker (1991) and Maravelakis et al. (2000). A problem related to the usage of eigenvectors with similar eigenvalues is revealed in this study and suggestions from previous studies regarding this matter are presented. Due to lack of multivariate approaches in dealing with the interpretation of a multivariate control chart signal, this study proposes a new method which embraces the principles of Union Intersection Test (UIT) in diagnosing an OOC signal. A thorough discussion of the UIT principle, the hypotheses, the test statistic and the application of the union intersection technique in the diagnosis problem is presented. An extension of the first comparison study is which includes the proposed method is carried out. The performance of the new diagnostic method is studied and its strengths and weaknesses are discussed. A simplified version for the new method, involving application of spectral decomposition, is also proposed. By using this simplified approach, the common practice of considering multiple types of covariance matrices in a comparison study of diagnostic methods can be avoided to some extent. This study is concluded with a few suggestions of potential further work

    On Practical machine Learning and Data Analysis

    Get PDF
    This thesis discusses and addresses some of the difficulties associated with practical machine learning and data analysis. Introducing data driven methods in e.g industrial and business applications can lead to large gains in productivity and efficiency, but the cost and complexity are often overwhelming. Creating machine learning applications in practise often involves a large amount of manual labour, which often needs to be performed by an experienced analyst without significant experience with the application area. We will here discuss some of the hurdles faced in a typical analysis project and suggest measures and methods to simplify the process. One of the most important issues when applying machine learning methods to complex data, such as e.g. industrial applications, is that the processes generating the data are modelled in an appropriate way. Relevant aspects have to be formalised and represented in a way that allow us to perform our calculations in an efficient manner. We present a statistical modelling framework, Hierarchical Graph Mixtures, based on a combination of graphical models and mixture models. It allows us to create consistent, expressive statistical models that simplify the modelling of complex systems. Using a Bayesian approach, we allow for encoding of prior knowledge and make the models applicable in situations when relatively little data are available. Detecting structures in data, such as clusters and dependency structure, is very important both for understanding an application area and for specifying the structure of e.g. a hierarchical graph mixture. We will discuss how this structure can be extracted for sequential data. By using the inherent dependency structure of sequential data we construct an information theoretical measure of correlation that does not suffer from the problems most common correlation measures have with this type of data. In many diagnosis situations it is desirable to perform a classification in an iterative and interactive manner. The matter is often complicated by very limited amounts of knowledge and examples when a new system to be diagnosed is initially brought into use. We describe how to create an incremental classification system based on a statistical model that is trained from empirical data, and show how the limited available background information can still be used initially for a functioning diagnosis system. To minimise the effort with which results are achieved within data analysis projects, we need to address not only the models used, but also the methodology and applications that can help simplify the process. We present a methodology for data preparation and a software library intended for rapid analysis, prototyping, and deployment. Finally, we will study a few example applications, presenting tasks within classification, prediction and anomaly detection. The examples include demand prediction for supply chain management, approximating complex simulators for increased speed in parameter optimisation, and fraud detection and classification within a media-on-demand system

    Syndromic surveillance: reports from a national conference, 2004

    Get PDF
    Overview, Policy, and Systems -- Federal Role in Early Detection Preparedness Systems -- BioSense: Implementation of a National Early Event Detection and Situational Awareness System -- Guidelines for Constructing a Statewide Hospital Syndromic Surveillance Network -- -- Data Sources -- Implementation of Laboratory Order Data in BioSense Early Event Detection and Situation Awareness System -- Use of Medicaid Prescription Data for Syndromic Surveillance ? New York -- Poison Control Center?Based Syndromic Surveillance for Foodborne Illness -- Monitoring Over-The-Counter Medication Sales for Early Detection of Disease Outbreaks ? New York City -- Experimental Surveillance Using Data on Sales of Over-the-Counter Medications ? Japan, November 2003?April 2004 -- -- Analytic Methods -- Public Health Monitoring Tools for Multiple Data Streams -- Use of Multiple Data Streams to Conduct Bayesian Biologic Surveillance -- Space-Time Clusters with Flexible Shapes -- INFERNO: A System for Early Outbreak Detection and Signature Forecasting -- High-Fidelity Injection Detectability Experiments: a Tool for Evaluating Syndromic Surveillance Systems -- Linked Analysis for Definition of Nurse Advice Line Syndrome Groups, and Comparison to Encounters -- -- Simulation and Other Evaluation Approaches -- Simulation for Assessing Statistical Methods of Biologic Terrorism Surveillance -- An Evaluation Model for Syndromic Surveillance: Assessing the Performance of a Temporal Algorithm -- Evaluation of Syndromic Surveillance Based on National Health Service Direct Derived Data ? England and Wales -- Initial Evaluation of the Early Aberration Reporting System ? Florida -- -- Practice and Experience -- Deciphering Data Anomalies in BioSense -- Syndromic Surveillance on the Epidemiologist?s Desktop: Making Sense of Much Data -- Connecting Health Departments and Providers: Syndromic Surveillance?s Last Mile -- Comparison of Syndromic Surveillance and a Sentinel Provider System in Detecting an Influenza Outbreak ? Denver, Colorado, 2003 -- Ambulatory-Care Diagnoses as Potential Indicators of Outbreaks of Gastrointestinal Illness ? Minnesota -- Emergency Department Visits for Concern Regarding Anthrax ? New Jersey, 2001 -- Hospital Admissions Syndromic Surveillance ? Connecticut, October 2001?June 2004 -- Three Years of Emergency Department Gastrointestinal Syndromic Surveillance in New York City: What Have we Found?"August 26, 2005."Papers from the National Syndromic Surveillance Conference sponsored by the Centers for Disease Control and Prevention, the Tufts Health Care Institute, the Alfred P. Sloan Foundation, held Nov. 3-4, 2004 in Boston, MA."Public health surveillance continues to broaden in scope and intensity. Public health professionals responsible for conducting such surveillance must keep pace with evolving methodologies, models, business rules, policies, roles, and procedures. The third annual Syndromic Surveillance Conference was held in Boston, Massachusetts, during November 3-4, 2004. The conference was attended by 440 persons representing the public health, academic, and private-sector communities from 10 countries and provided a forum for scientific discourse and interaction regarding multiple aspects of public health surveillance." - p. 3Also vailable via the World Wide Web

    RANK-BASED TEMPO-SPATIAL CLUSTERING: A FRAMEWORK FOR RAPID OUTBREAK DETECTION USING SINGLE OR MULTIPLE DATA STREAMS

    Get PDF
    In the recent decades, algorithms for disease outbreak detection have become one of the main interests of public health practitioners to identify and localize an outbreak as early as possible in order to warrant further public health response before a pandemic develops. Today’s increased threat of biological warfare and terrorism provide an even stronger impetus to develop methods for outbreak detection based on symptoms as well as definitive laboratory diagnoses. In this dissertation work, I explore the problems of rapid disease outbreak detection using both spatial and temporal information. I develop a framework of non-parameterized algorithms which search for patterns of disease outbreak in spatial sub-regions of the monitored region within a certain period. Compared to the current existing spatial or tempo-spatial algorithm, the algorithms in this framework provide a methodology for fast searching of either univariate data set or multivariate data set. It first measures which study area is more likely to have an outbreak occurring given the baseline data and currently observed data. Then it applies a greedy searching mechanism to look for clusters with high posterior probabilities given the risk measurement for each unit area as heuristic. I also explore the performance of the proposed algorithms. From the perspective of predictive modeling, I adopt a Gamma-Poisson (GP) model to compute the probability of having an outbreak in each cluster when analyzing univariate data. I build a multinomial generalized Dirichlet (MGD) model to identify outbreak clusters from multivariate data which include the OTC data streams collected by the national retail data monitor (NRDM) and the ED data streams collected by the RODS system. Key contributions of this dissertation include 1) it introduces a rank-based tempo-spatial clustering algorithm, RSC, by utilizing greedy searching and Bayesian GP model for disease outbreak detection with comparable detection timeliness, cluster positive prediction value (PPV) and improved running time; 2) it proposes a multivariate extension of RSC (MRSC) which applies MGD model. The evaluation demonstrated the advantage that MGD model can effectively suppress the false alarms caused by elevated signals that are non-disease relevant and occur in all the monitored data streams

    Imaging Based Prediction of Pathology in Adult Diffuse Glioma with Applications to Therapy and Prognosis

    Get PDF
    The overall aggressiveness of a glioma is measured by histologic and molecular analysis of tissue samples. However, the well-known spatial heterogeneity in gliomas limits the ability for clinicians to use that information to make spatially specific treatment decisions. Magnetic resonance imaging (MRI) visualizes and assesses the tumor. But, the exact degree to which MRI correlates with the actual underlying tissue characteristics is not known. In this work, we derive quantitative relationships between imaging and underlying pathology. These relations increase the value of MRI by allowing it to be a better surrogate for underlying pathology and they allow evaluation of the underlying biological heterogeneity via imaging. This provides an approach to answer questions about how tissue heterogeneity can affect prognosis. We estimated the local pathology within tumors using imaging data and stereotactically precise biopsy samples from an ongoing clinical imaging trial. From this data, we trained a random forest model to reliably predict tumor grade, proliferation, cellularity, and vascularity, representing tumor aggressiveness. We then made voxel-wise predictions to map the tumor heterogeneity and identify high-grade malignancy disease. Next, we used the previously trained models on a cohort of 1,850 glioma patients who previously underwent surgical resection. High contrast enhancement, proliferation, vascularity, and cellularity were associated with worse prognosis even after controlling for clinical factors. Patients that had substantial reduction in cellularity between preoperative and postoperative imaging (i.e. due to resection) also showed improved survival. We developed a clinically implementable model for predicting pathology and prognosis after surgery based on imaging. Results from imaging pathology correlations enhance our understanding of disease extent within glioma patients and the relationship between residual estimated pathology and outcome helps refine our knowledge of the interaction of tumor heterogeneity and prognosis

    Time-Series Embedded Feature Selection Using Deep Learning: Data Mining Electronic Health Records for Novel Biomarkers

    Get PDF
    As health information technologies continue to advance, routine collection and digitisation of patient health records in the form of electronic health records present as an ideal opportunity for data-mining and exploratory analysis of biomarkers and risk factors indicative of a potentially diverse domain of patient outcomes. Patient records have continually become more widely available through various initiatives enabling open access whilst maintaining critical patient privacy. In spite of such progress, health records remain not widely adopted within the current clinical statistical analysis domain due to challenging issues derived from such “big data”.Deep learning based temporal modelling approaches present an ideal solution to health record challenges through automated self-optimisation of representation learning, able to man-ageably compose the high-dimensional domain of patient records into data representations able to model complex data associations. Such representations can serve to condense and reduce dimensionality to emphasise feature sparsity and importance through novel embedded feature selection approaches. Accordingly, application towards patient records enable complex mod-elling and analysis of the full domain of clinical features to select biomarkers of predictive relevance.Firstly, we propose a novel entropy regularised neural network ensemble able to highlight risk factors associated with hospitalisation risk of individuals with dementia. The application of which, was able to reduce a large domain of unique medical events to a small set of relevant risk factors able to maintain hospitalisation discrimination.Following on, we continue our work on ensemble architecture approaches with a novel cas-cading LSTM ensembles to predict severe sepsis onset within critical patients in an ICU critical care centre. We demonstrate state-of-the-art performance capabilities able to outperform that of current related literature.Finally, we propose a novel embedded feature selection application dubbed 1D convolu-tion feature selection using sparsity regularisation. Said methodology was evaluated on both domains of dementia and sepsis prediction objectives to highlight model capability and generalisability. We further report a selection of potential biomarkers for the aforementioned case study objectives highlighting clinical relevance and potential novelty value for future clinical analysis.Accordingly, we demonstrate the effective capability of embedded feature selection ap-proaches through the application of temporal based deep learning architectures in the discovery of effective biomarkers across a variety of challenging clinical applications

    Doctor of Philosophy

    Get PDF
    dissertationPublic health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today's public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline iv composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted
    corecore