7 research outputs found

    Retrospective time series analysis of veterinary laboratory data : Preparing a historical baseline for cluster detection in syndromic surveillance

    Get PDF
    The practice of disease surveillance has shifted in the last two decades towards the introduction of systems capable of early detection of disease. Modern biosurveillance systems explore different sources of pre-diagnostic data, such as patient's chief complaint upon emergency visit or laboratory test orders. These sources of data can provide more rapid detection than traditional surveillance based on case confirmation, but are less specific, and therefore their use poses challenges related to the presence of background noise and unlabelled temporal aberrations in historical data. The overall goal of this study was to carry out retrospective analysis using three years of laboratory test submissions to the Animal Health Laboratory in the province of Ontario, Canada, in order to prepare the data for use in syndromic surveillance. Daily cases were grouped into syndromes and counts for each syndrome were monitored on a daily basis when medians were higher than one case per day, and weekly otherwise. Poisson regression accounting for day-of-week and month was able to capture the day-of-week effect with minimal influence from temporal aberrations. Applying Poisson regression in an iterative manner, that removed data points above the predicted 95th percentile of daily counts, allowed for the removal of these aberrations in the absence of labelled outbreaks, while maintaining the day-of-week effect that was present in the original data. This resulted in the construction of time series that represent the baseline patterns over the past three years, free of temporal aberrations. The final method was thus able to remove temporal aberrations while keeping the original explainable effects in the data, did not need a training period free of aberrations, had minimal adjustment to the aberrations present in the raw data, and did not require labelled outbreaks. Moreover, it was readily applicable to the weekly data by substituting Poisson regression with moving 95th percentiles

    SPEECH TO CHART: SPEECH RECOGNITION AND NATURAL LANGUAGE PROCESSING FOR DENTAL CHARTING

    Get PDF
    Typically, when using practice management systems (PMS), dentists perform data entry by utilizing an assistant as a transcriptionist. This prevents dentists from interacting directly with the PMSs. Speech recognition interfaces can provide the solution to this problem. Existing speech interfaces of PMSs are cumbersome and poorly designed. In dentistry, there is a desire and need for a usable natural language interface for clinical data entry. Objectives. (1) evaluate the efficiency, effectiveness, and user satisfaction of the speech interfaces of four dental PMSs, (2) develop and evaluate a speech-to-chart prototype for charting naturally spoken dental exams. Methods. We evaluated the speech interfaces of four leading PMSs. We manually reviewed the capabilities of each system and then had 18 dental students chart 18 findings via speech in each of the systems. We measured time, errors, and user satisfaction. Next, we developed and evaluated a speech-to-chart prototype which contained the following components: speech recognizer; post-processor for error correction; NLP application (ONYX) and; graphical chart generator. We evaluated the accuracy of the speech recognizer and the post-processor. We then performed a summative evaluation on the entire system. Our prototype charted 12 hard tissue exams. We compared the charted exams to reference standard exams charted by two dentists. Results. Of the four systems, only two allowed both hard tissue and periodontal charting via speech. All interfaces required using specific commands directly comparable to using a mouse. The average time to chart the nine hard tissue findings was 2:48 and the nine periodontal findings was 2:06. There was an average of 7.5 errors per exam. We created a speech-to-chart prototype that supports natural dictation with no structured commands. On manually transcribed exams, the system performed with an average 80% accuracy. The average time to chart a single hard tissue finding with the prototype was 7.3 seconds. An improved discourse processor will greatly enhance the prototype's accuracy. Conclusions. The speech interfaces of existing PMSs are cumbersome, require using specific speech commands, and make several errors per exam. We successfully created a speech-to-chart prototype that charts hard tissue findings from naturally spoken dental exams

    Anomaly Detection in Time Series: Theoretical and Practical Improvements for Disease Outbreak Detection

    Get PDF
    The automatic collection and increasing availability of health data provides a new opportunity for techniques to monitor this information. By monitoring pre-diagnostic data sources, such as over-the-counter cough medicine sales or emergency room chief complaints of cough, there exists the potential to detect disease outbreaks earlier than traditional laboratory disease confirmation results. This research is particularly important for a modern, highly-connected society, where the onset of disease outbreak can be swift and deadly, whether caused by a naturally occurring global pandemic such as swine flu or a targeted act of bioterrorism. In this dissertation, we first describe the problem and current state of research in disease outbreak detection, then provide four main additions to the field. First, we formalize a framework for analyzing health series data and detecting anomalies: using forecasting methods to predict the next day's value, subtracting the forecast to create residuals, and finally using detection algorithms on the residuals. The formalized framework indicates the link between the forecast accuracy of the forecast method and the performance of the detector, and can be used to quantify and analyze the performance of a variety of heuristic methods. Second, we describe improvements for the forecasting of health data series. The application of weather as a predictor, cross-series covariates, and ensemble forecasting each provide improvements to forecasting health data. Third, we describe improvements for detection. This includes the use of multivariate statistics for anomaly detection and additional day-of-week preprocessing to aid detection. Most significantly, we also provide a new method, based on the CuScore, for optimizing detection when the impact of the disease outbreak is known. This method can provide an optimal detector for rapid detection, or for probability of detection within a certain timeframe. Finally, we describe a method for improved comparison of detection methods. We provide tools to evaluate how well a simulated data set captures the characteristics of the authentic series and time-lag heatmaps, a new way of visualizing daily detection rates or displaying the comparison between two methods in a more informative way

    Transfer Learning for Bayesian Case Detection Systems

    Get PDF
    In this age of big biomedical data, a variety of data has been produced worldwide. If we could combine that data more effectively, we might well develop a deeper understanding of biomedical problems and their solutions. Compared to traditional machine learning techniques, transfer learning techniques explicitly model differences among origins of data to provide a smooth transfer of knowledge. Most techniques focus on the transfer of data, while more recent techniques have begun to explore the possibility of transfer of models. Model-transfer techniques are especially appealing in biomedicine because they involve fewer privacy risks. Unfortunately, most model-transfer techniques are unable to handle heterogeneous scenarios where models differ in the features they contain, which occur commonly with biomedical data. This dissertation develops an innovative transfer learning framework to share both data and models under a variety of conditions, while allowing the inclusion of features that are unique to and informative about the target context. I used both synthetic and real-world datasets to test two hypotheses: 1) a transfer learning model that is learned using source knowledge and target data performs classification in the target context better than a target model that is learned solely from target data; 2) a transfer learning model performs classification in the target context better than a source model. I conducted a comprehensive analysis to investigate conditions where these two hypotheses hold, and more generally the factors that affect the effectiveness of transfer learning, providing empirical opinions about when and what to share. My research enables knowledge sharing under heterogeneous scenarios and provides an approach for understanding transfer learning performance in terms of differences of features, distributions, and sample sizes between source and target. The model-transfer algorithm can be viewed as a new Bayesian network learning algorithm with a flexible representation of prior knowledge. In concrete terms, this work shows the potential for transfer learning to assist in the rapid development of a case detection system for an emergent unknown disease. More generally, to my knowledge, this research is the first investigation of model-based transfer learning in biomedicine under heterogeneous scenarios

    Syndromic surveillance : made in Europe

    Get PDF

    Syndrome classification through a retrospective analysis of porcine submissions to a regional animal health laboratory

    No full text
    In response to the global threats of emerging infectious diseases and bioterrorism events, public health surveillance developed analytical methods to cluster early health indicators from multiple data sources into “syndromes” for rapid and efficient disease detection. Syndromic surveillance has become well established in public health, using many different health indicators from multiple sources. In animal health, the timeliness and efficiency of disease detection in early warning surveillance systems has been enhanced by including syndromic surveillance methods. Animal health syndromic surveillance improves disease detection through the analysis of pre-diagnostic data collected for other purposes, from sources such as laboratories, veterinary clinics, abattoirs, farms and pharmacies. However, the data are inherently non-disease specific compared to traditional surveillance and require analyses to ensure that syndromes represent significant diseases as accurately as possible. Syndrome classification is an analytical process that identifies, collates and validates pre-diagnostic indicators within a data source into accurate and viable syndromes. The goals of this thesis were as follows: a) Review surveillance systems and methods to understand the scale, complexity and validity of different syndromic surveillance approaches. b) Describe and evaluate six years of swine laboratory submission data to Veterinary Diagnostic Services (VDS) in the province of Manitoba, Canada, for the purpose of syndromic surveillance. c) Finally, identify and validate the most appropriate syndromes from pre-diagnostic data within the submitted swine cases. An initial systematic review of public health syndromic surveillance was conducted with 81 studies meeting the criteria. The variety and frequency of populations under surveillance, information sources, pre-diagnostic indicators, syndromes and reported values were recorded. The predominant methods for syndrome classification, temporal and spatial analysis and aberration detection were also described. 21,665 swine laboratory submissions from January 2003 to March 2009, including 4726 pathology cases, were evaluated. The frequency and distributions of the predominant pre-diagnostic indicators, test requests and specimen types, were described. The most common pathology diagnoses and organ system involvement were reported for the pathology submissions. For syndrome validation, a Multiple Correspondence Analysis was conducted to cluster multiple pathology diagnoses per case into four diagnostic groups based on organ systems; Respiratory, Multisystemic, Gastrointestinal and “Other”. Syndrome classification was completed, first using agglomerative hierarchical clustering to classify syndromes from 30 test requests and 34 specimen types. For validation, the syndromes were used as predictive variables in a multinomial logistic regression model applied to training and test data sets. The overall model sensitivity, specificity and predictive values for each organ system outcome were estimated. The individual syndromes were compared using relative risk ratios and marginal effects. Five syndromes were identified as having a significantly higher predictive association with one organ system group (compared to the other three): Respiratory, GI, Reproductive, Joint and PCV (specific to porcine circovirus associated disease). The methods in this thesis identified a simplified analytical approach for syndrome classification of laboratory test requests and specimen types within swine submissions. Alternative algorithms for syndrome grouping, establishment of temporal baselines and exploration of automated aberration detection were identified as areas for future research
    corecore