42,038 research outputs found

    Using T3, an improved decision tree classifier, for mining stroke-related medical data

    Get PDF
    Objectives: Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical management and investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. Method: We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. Results: T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. Conclusions: This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produce

    Penerapan Algoritma C4.5 untuk Klasifikasi Data Rekam Medis Berdasarkan International Classification Diseases (ICD-10)

    Full text link
    The medical record data is the patient's current record of medical records, the medical record data only being data stacked and not traced to generate useful knowledge for the hospital. This study can process the medical record data to classify the disease that occurs in sleeping sickness based on  ICD-10. The method used in this research is C4.5 algorithm method by using attribute of International disease code as attribute of destination label as many as 21 International disease group, that is: A00-B99 up to Z00-Z99. This study yields a decision of the value code, C4.5 code can represent as many as 14 attribute values ​​of disease code objectives and data percentage that read more than 66%. The conclusion of this research is C4.5 algorithm help classify International disease code based on ICD-10 and decision tree making which can give information of any disease that often happened at hospital  Keywords: data mining, classification, C4.5, medical records, ICD-1

    Predictive Modelling Approach to Data-driven Computational Psychiatry

    Get PDF
    This dissertation contributes with novel predictive modelling approaches to data-driven computational psychiatry and offers alternative analyses frameworks to the standard statistical analyses in psychiatric research. In particular, this document advances research in medical data mining, especially psychiatry, via two phases. In the first phase, this document promotes research by proposing synergistic machine learning and statistical approaches for detecting patterns and developing predictive models in clinical psychiatry data to classify diseases, predict treatment outcomes or improve treatment selections. In particular, these data-driven approaches are built upon several machine learning techniques whose predictive models have been pre-processed, trained, optimised, post-processed and tested in novel computationally intensive frameworks. In the second phase, this document advances research in medical data mining by proposing several novel extensions in the area of data classification by offering a novel decision tree algorithm, which we call PIDT, based on parameterised impurities and statistical pruning approaches toward building more accurate decision trees classifiers and developing new ensemblebased classification methods. In particular, the experimental results show that by building predictive models with the novel PIDT algorithm, these models primarily led to better performance regarding accuracy and tree size than those built with traditional decision trees. The contributions of the proposed dissertation can be summarised as follow. Firstly, several statistical and machine learning algorithms, plus techniques to improve these algorithms, are explored. Secondly, prediction modelling and pattern detection approaches for the first-episode psychosis associated with cannabis use are developed. Thirdly, a new computationally intensive machine learning framework for understanding the link between cannabis use and first-episode psychosis was introduced. Then, complementary and equally sophisticated prediction models for the first-episode psychosis associated with cannabis use were developed using artificial neural networks and deep learning within the proposed novel computationally intensive framework. Lastly, an efficient novel decision tree algorithm (PIDT) based on novel parameterised impurities and statistical pruning approaches is proposed and tested with several medical datasets. These contributions can be used to guide future theory, experiment, and treatment development in medical data mining, especially psychiatry

    INFORMATION SUPPORT SYSTEM OF MEDICAL SYSTEM RESEARCH

    Get PDF
    Background. Medical system research requires information support system of implementing data mining algorithms resulting in decision trees or IF-THEN rules. Besides that, this system should be object-oriented and web-integrated.Objective. The aim of this study was to develop information support system based on data mining algorithms applied to system analysis method for medical system research.Methods. System analysis methods are used for qualitative analysis of mathematical models diseases. Algorithms such as decision tree induction and sequential covering algorithm are applied for data mining from learning data set.Results. Taking into consideration the complexity of mathematical equations (nonlinear systems with delays), scientific community requires the appearance of new powerfull methods of exact parameter identification and qualitative analysis. From the point of view of theoretical medicine, uncertainties arising in models of diseases require to develop treatment schemes that are effective, take into account toxicity constraints, enable better life quality, have cost benefit. Multivariate method of qualitative analysis of mathematical models can be used for pathologic process forms of classification.Conclusions. The complex qualitative behavior of diseases models depending on parameters and controllers was observed in our investigation even without considering probabilistic nature of the majority of quantities and parameters of information models.KEY WORDS: data mining, system analysis, medical research, decision makin

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Fuzzy rule-based system applied to risk estimation of cardiovascular patients

    Get PDF
    Cardiovascular decision support is one area of increasing research interest. On-going collaborations between clinicians and computer scientists are looking at the application of knowledge discovery in databases to the area of patient diagnosis, based on clinical records. A fuzzy rule-based system for risk estimation of cardiovascular patients is proposed. It uses a group of fuzzy rules as a knowledge representation about data pertaining to cardiovascular patients. Several algorithms for the discovery of an easily readable and understandable group of fuzzy rules are formalized and analysed. The accuracy of risk estimation and the interpretability of fuzzy rules are discussed. Our study shows, in comparison to other algorithms used in knowledge discovery, that classifcation with a group of fuzzy rules is a useful technique for risk estimation of cardiovascular patients. © 2013 Old City Publishing, Inc
    corecore