8 research outputs found

    Master of Science

    Get PDF
    thesisData quality has become a significant issue in healthcare as large preexisting databases are integrated to provide greater depth for research and process improvement. Large scale data integration exposes and compounds data quality issues latent in source systems. Although the problems related to data quality in transactional databases have been identified and well-addressed, the application of data quality constraints to large scale data repositories has not and requires novel applications of traditional concepts and methodologies. Despite an abundance of data quality theory, tools and software, there is no consensual technique available to guide developers in the identification of data integrity issues and the application of data quality rules in warehouse-type applications. Data quality measures are frequently developed on an ad hoc basis or methods designed to assure data quality in transactional systems are loosely applied to analytic data stores. These measures are inadequate to address the complex data quality issues in large, integrated data repositories particularly in the healthcare domain with its heterogeneous source systems. This study derives a taxonomy of data quality rules from relational database theory. It describes the development and implementation of data quality rules in the Analytic Health Repository at Intermountain Healthcare and situates the data quality rules in the taxonomy. Further, it identifies areas in which more rigorous data quality iv should be explored. This comparison demonstrates the superiority of a structured approach to data quality rule identification

    Applying business intelligence concepts to medicaid claim fraud detection

    Get PDF
    Abstract U.S. governmental agencies are striving to do more with less. Controlling the costs of delivering healthcare services such as Medicaid is especially critical at a time of increasing program enrollment and decreasing state budgets. Fraud is estimated to steal up to ten percent of the taxpayer dollars used to fund governmentally supported healthcare, making it critical for government authorities to find cost effective methods to detect fraudulent transactions. This paper explores the use of a business intelligence system relying on statistical methods to detect fraud in one state's existing Medicaid claim payment data. This study shows that Medicaid claim transactions that have been collected for payment purposes can be reformatted and analyzed to detect fraud and provide input for decision makers charged with making the best use of available funding. The results illustrate the efficacy of using unsupervised statistical methods to detect fraud in healthcare-related data

    Supporting reuse of EHR data in healthcare organizations: The CARED research infrastructure framework

    Get PDF
    Healthcare organizations have in recent years started assembling their Electronic Health Record (EHR) data in data repositories to unlock their value using data analysis techniques. There are however a number of technical, organizational and ethical challenges that should be considered when reusing EHR data, which infrastructure technology consisting of appropriate software and hardware components can address. In a case study in the University Medical Center Utrecht (UMCU) in the Netherlands, we identified nine requirements of a modern technical infrastructure for reusing EHR data: (1) integrate data sources, (2) preprocess data, (3) store data, (4) support collaboration and documentation, (5) support various software and tooling packages, (6) enhance repeatability, (7) enhance privacy and security, (8) automate data process and (9) support analysis applications. We propose the CApable Reuse of EHR Data (CARED) framework for infrastructure that addresses these requirements, which consists of five consecutive data processing layers, and a control layer that governs the data processing. We then evaluate the framework with respect to the requirements, and finally describe its successful implementation in the Psychiatry Department of the UMCU along with three analysis cases. Our CARED research infrastructure framework can support healthcare organizations that aim to successfully reuse their EHR data

    Examining the Transitional Impact of ICD-10 on Healthcare Fraud Detection

    Get PDF
    On October 1st, 2015, the tenth revision of the International Classification of Diseases (ICD-10) will be mandatorily implemented in the United States. Although this medical classification system will allow healthcare professionals to code with greater accuracy, specificity, and detail, these codes will have a significant impact on the flavor of healthcare insurance claims. While the overall benefit of ICD-10 throughout the healthcare industry is unquestionable, some experts believe healthcare fraud detection and prevention could experience an initial drop in performance due to the implementation of ICD-10. We aim to quantitatively test the validity of this concern regarding an adverse transitional impact. This project explores how predictive fraud detection systems developed using ICD-9 claims data will initially react to the introduction of ICD-10. We have developed a basic fraud detection system incorporating both unsupervised and supervised learning methods in order to examine the potential fraudulence of both ICD-9 and ICD-10 claims in a predictive environment. Using this system, we are able to analyze the ability and performance of statistical methods trained using ICD-9 data to properly identify fraudulent ICD-10 claims. This research makes contributions to the domains of medical coding, healthcare informatics, and fraud detection

    People Talking and AI Listening: How Stigmatizing Language in EHR Notes Affect AI Performance

    Full text link
    Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.Comment: 54 pages, 9 figure

    Practical approaches to mining of clinical datasets : from frameworks to novel feature selection

    Get PDF
    Research has investigated clinical data that have embedded within them numerous complexities and uncertainties in the form of missing values, class imbalances and high dimensionality. The research in this thesis was motivated by these challenges to minimise these problems whilst, at the same time, maximising classification performance of data and also selecting the significant subset of variables. As such, this led to the proposal of a data mining framework and feature selection method. The proposed framework has a simple algorithmic framework and makes use of a modified form of existing frameworks to address a variety of different data issues, called the Handling Clinical Data Framework (HCDF). The assessment of data mining techniques reveals that missing values imputation and resampling data for class balancing can improve the performance of classification. Next, the proposed feature selection method was introduced; it involves projecting onto principal component method (FS-PPC) and draws on ideas from both feature extraction and feature selection to select a significant subset of features from the data. This method selects features that have high correlation with the principal component by applying symmetrical uncertainty (SU). However, irrelevant and redundant features are removed by using mutual information (MI). However, this method provides confidence in the selected subset of features that will yield realistic results with less time and effort. FS-PPC is able to retain classification performance and meaningful features while consisting of non-redundant features. The proposed methods have been practically applied to analysis of real clinical data and their effectiveness has been assessed. The results show that the proposed methods are enable to minimise the clinical data problems whilst, at the same time, maximising classification performance of data
    corecore