67,698 research outputs found

    Data Quality and Completeness in a Web Stroke Registry as the Basis for Data and Process Mining

    Get PDF
    Electronic health records often show missing values and errors jeopardizing their effective exploitation. We illustrate the re-engineering process needed to improve the data quality of a web-based, multicentric stroke registry by proposing a knowledge-based data entry support able to help users to homogeneously interpret data items, and to prevent and detect treacherous errors. The re-engineering also improves stroke units coordination and networking, through ancillary tools for monitoring patient enrollments, calculating stroke care indicators, analyzing compliance with clinical practice guidelines, and entering stroke units profiles. Finally we report on some statistics, such as calculation of indicators for assessing the quality of stroke care, data mining for knowledge discovery, and process mining for comparing different processes of care delivery. The most important results of the re-engineering are an improved user experience with data entry, and a definitely better data quality that guarantees the reliability of data analyses

    Data mining information from electronic health records produced high yield and accuracy for current smoking status

    Get PDF
    OBJECTIVES: Researchers are increasingly using routine clinical data for care evaluations and feedback to patients and clinicians. The quality of these evaluations depends on the quality and completeness of the input data. STUDY DESIGN AND SETTING: We assessed the performance of an electronic health record (EHR)-based data mining algorithm, using the example of the smoking status in a cardiovascular population. As a reference standard, we used the questionnaire from the Utrecht Cardiovascular Cohort (UCC). To assess diagnostic accuracy, we calculated sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV). RESULTS: We analyzed 1,661 patients included in the UCC to January 18, 2019. Of those, 14% (n = 238) had missing information on smoking status in the UCC questionnaire. Data mining provided information on smoking status in 99% of the 1,661 participants. Diagnostic accuracy for current smoking was sensitivity 88%, specificity 92%, NPV 98%, and PPV 63%. From false positives, 85% reported they had quit smoking at the time of the UCC. CONCLUSION: Data mining showed great potential in retrieving information on smoking (a near complete yield). Its diagnostic performance is good for negative smoking statuses. The implications of misclassification with data mining are dependent on the application of the data

    Comparison of low-cost handheld LiDAR-based SLAM systems for mapping underground tunnels

    Get PDF
    The use of mobile mapping technologies (MMT) has become increasingly popular across various applications such as forestry, cultural heritage, mining, and civil engineering. While Simultaneous Localization and Mapping (SLAM) algorithms have greatly improved in recent years with regards to accuracy, robustness, and cooperativity, it is important to understand the limitations and strengths of each metrological measurement method to ensure the provision of 3D data of appropriate quality for the selected application. In this study, we perform a comparative analysis of three LiDAR-based handheld mobile mapping systems with survey-grade reference point clouds in a challenging test area of a partially collapsed underground tunnel. We investigate various aspects of 3D data quality, including accuracy and completeness, and present an improved method for 3D data completeness assessment aimed at evaluating SLAM-derived point clouds. The results demonstrate unique and diverse strengths and shortcomings of the tested mapping systems, which provide valuable guidelines for selecting an appropriate system for subterranean applications

    Building Data-Driven Pathways From Routinely Collected Hospital Data:A Case Study on Prostate Cancer

    Get PDF
    Background: Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. Objective: The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. Methods: Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. Results: The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the computation of quality indicators and dimensions. A novel graphical representation of the pathways allows the synthesis of such information. Conclusions: Clinical pathways built from routinely collected hospital data can unearth information about patients and diseases that may otherwise be unavailable or overlooked in hospitals. Data-driven clinical pathways allow for heterogeneous data (ie, semistructured and unstructured data) to be collated over a unified data model and for data quality dimensions to be assessed. This work has enabled further research on prostate cancer and its biomarkers, and on the development and application of methods to mine, compare, analyze, and visualize pathways constructed from routine data. This is an important development for the reuse of big data in hospitals

    KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

    Full text link
    We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.Comment: 9 page

    Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

    Get PDF
    Data presented in this article relates to the research article entitled “Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data” (Peng et al. [1]) in preparation). We provided a set of ICD-10 coding association rules in the age group of 55 to 65. The rules were extracted from an inpatient administrative health data at five acute care hospitals in Alberta, Canada, using association rule mining. Thresholds of support and confidence for the association rules mining process were set at 0.19% and 50% respectively. The data set contains 426 rules, in which 86 rules are not nested. Data are provided in the supplementary material. The presented coding association rules provide a reference for future researches on the use of association rule mining for data quality assessment

    APHRODITE: an Anomaly-based Architecture for False Positive Reduction

    Get PDF
    We present APHRODITE, an architecture designed to reduce false positives in network intrusion detection systems. APHRODITE works by detecting anomalies in the output traffic, and by correlating them with the alerts raised by the NIDS working on the input traffic. Benchmarks show a substantial reduction of false positives and that APHRODITE is effective also after a "quick setup", i.e. in the realistic case in which it has not been "trained" and set up optimall
    corecore