29,529 research outputs found
Development and validation of data quality rules in administrative health data using association rule mining
Introduction
Data quality assessment is a challenging facet for researches using coded administrative health data. Our previous study had demonstrated the potentials of association rule mining to assess data quality. The objective of this study is to develop and validate a set of coding association rules for data quality assessment.
Objectives and Approach
We used the Canadian reabstracted hospital discharge abstract data (DAD) with clinical diagnosis coded in International Classification of Disease – 10th revision, Canada (ICD-10-CA) codes for rule development. The DAD data were divided into 5 age groups. Association rule mining were conducted on reabstracted DAD in each age group to extract ICD-10 coding association rules at the three and four digits levels. The rule strength was assessed using support and confidence. The rules will be reviewed by a panel of 5 physicians and 2 coding specialists to assess their appropriateness from clinical and coding perspectives using a modified Delphi rating
Results
In total, 975 rules at the three digits level and 822 rules at the four digits level were learned from the data. Half of the rules were in the age group of ≥65 and no rules were found in the age group of 5 to 19. The interquartile range of rule confidences were 0.112 to 0.425 in the three digits level and 0.073 to 0.222 in the four digits level. Two-thirds of rules had the diagnosis codes related to endocrine and metabolic disorders and diseases of circulatory, respiratory and genitourinary systems. The panel review will be conducted in early April and will have the final set of rules available before the conference.
Conclusion/Implications
This study developed a set of validated ICD-10 coding association rules and creates a useful tool to cost-effectively assess data quality in routinely collected administrative health data
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
An Intelligent Data Mining System to Detect Health Care Fraud
The chapter begins with an overview of the types of healthcare fraud. Next, there is a brief discussion of issues with the current fraud detection approaches. The chapter then develops information technology based approaches and illustrates how these technologies can improve current practice. Finally, there is a summary of the major findings and the implications for healthcare practice
Doctor of Philosophy
dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts
Hybrid model using logit and nonparametric methods for predicting micro-entity failure
Following the calls from literature on bankruptcy, a parsimonious hybrid bankruptcy model is developed in this paper
by combining parametric and non-parametric approaches.To this end, the variables with the highest predictive power to
detect bankruptcy are selected using logistic regression (LR). Subsequently, alternative non-parametric methods
(Multilayer Perceptron, Rough Set, and Classification-Regression Trees) are applied, in turn, to firms classified as
either “bankrupt” or “not bankrupt”. Our findings show that hybrid models, particularly those combining LR and
Multilayer Perceptron, offer better accuracy performance and interpretability and converge faster than each method
implemented in isolation. Moreover, the authors demonstrate that the introduction of non-financial and macroeconomic
variables complement financial ratios for bankruptcy prediction
Exploring the Interrelationship of Risk Factors for Supporting eHealth Knowledge-Based System
In developing countries like Africa, the physician-to-population ratio is below the World Health Organization (WHO) minimum recommendation. Because of the limited resource setting, the healthcare services did not get the equity of access to the use of health services, the sustainable health financing, and the quality of healthcare service provision. Efficient and effective teaching, alerting, and recommendation system are required to support the activities of the healthcare service. To alleviate those issues, creating a competitive eHealth knowledge-based system (KBS) will bring unlimited benefit. In this study, Apriori techniques are applied to malaria dataset to explore the degree of the association of risk factors. And then, integrate the output of data mining (i.e., the interrelationship of risk factors) with knowledge-based reasoning. Nearest neighbor retrieval algorithms (for retrieval) and voting method (to reuse tasks) are used to design and deliver personalized knowledge-based system
A study of unplanned 30-day hospital readmissions in the United States : early prediction and potentially modifiable risk factor identification
Unplanned hospital readmissions greatly impair patients' quality of life and have imposed a significant economic burden on American society. The pressure to reduce costs and improve healthcare quality has triggered the development of readmission reduction interventions. However, existing solutions focus on complementing inpatient care with enhanced care transition and post-discharge interventions, which are initiated near or after discharge when clinicians' impact on inpatient care is ending. Preventive intervention during hospitalization is an under-explored area, which holds the potential for reducing readmission risk. Nevertheless, it is challenging for clinicians to predict readmission risk at the early stage of inpatient care because little data is available. Existing readmission predictive models tend to incorporate variables whose values are only available near or after discharge. As a result, these models cannot be used for the early prediction of readmission. Another challenge is that there is no universal solution to reduce readmissions during hospitalization. Patients can be readmitted for any reason, and their heterogeneous social and clinical factors can further complicate the planning of interventions. The objective of this project was to improve the timeliness of readmission preventive intervention through a data-driven approach. A systematic review of the literature was performed to collect reported risk factors for unplanned 30-day hospital readmission. Using various predictive modeling and exploratory analysis methods, we have developed an early prediction model of readmission and have identified potentially modifiable readmission risk factors, which may be used to guide the development of readmission preventive interventions during hospitalization for different patients
- …