1,165 research outputs found

    Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

    Full text link
    This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Integrating Learning from Examples into the Search for Diagnostic Policies

    Full text link
    This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decision-making actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected total cost, which is the sum of measurement costs and misdiagnosis costs. In most diagnostic settings, there is a tradeoff between these two kinds of costs. This paper formalizes diagnostic decision making as a Markov Decision Process (MDP). The paper introduces a new family of systematic search algorithms based on the AO* algorithm to solve this MDP. To make AO* efficient, the paper describes an admissible heuristic that enables AO* to prune large parts of the search space. The paper also introduces several greedy algorithms including some improvements over previously-published methods. The paper then addresses the question of learning diagnostic policies from examples. When the probabilities of diseases and test results are computed from training data, there is a great danger of overfitting. To reduce overfitting, regularizers are integrated into the search algorithms. Finally, the paper compares the proposed methods on five benchmark diagnostic data sets. The studies show that in most cases the systematic search methods produce better diagnostic policies than the greedy methods. In addition, the studies show that for training sets of realistic size, the systematic search algorithms are practical on todays desktop computers

    An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders

    Full text link
    The data mining along with emerging computing techniques have astonishingly influenced the healthcare industry. Researchers have used different Data Mining and Internet of Things (IoT) for enrooting a programmed solution for diabetes and heart patients. However, still, more advanced and united solution is needed that can offer a therapeutic opinion to individual diabetic and cardio patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced healthcare system for proficient diabetes and cardiovascular diseases have been proposed. The hybridization of data mining and IoT with other emerging computing techniques is supposed to give an effective and economical solution to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining, Internet of Things, chatbots, contextual entity search (CES), bio-sensors, semantic analysis and granular computing (GC). The bio-sensors of the proposed system assist in getting the current and precise status of the concerned patients so that in case of an emergency, the needful medical assistance can be provided. The novelty lies in the hybrid framework and the adequate support of chatbots, granular computing, context entity search and semantic analysis. The practical implementation of this system is very challenging and costly. However, it appears to be more operative and economical solution for diabetes and cardio patients.Comment: 11 PAGE

    Fuzzy rule-based system applied to risk estimation of cardiovascular patients

    Get PDF
    Cardiovascular decision support is one area of increasing research interest. On-going collaborations between clinicians and computer scientists are looking at the application of knowledge discovery in databases to the area of patient diagnosis, based on clinical records. A fuzzy rule-based system for risk estimation of cardiovascular patients is proposed. It uses a group of fuzzy rules as a knowledge representation about data pertaining to cardiovascular patients. Several algorithms for the discovery of an easily readable and understandable group of fuzzy rules are formalized and analysed. The accuracy of risk estimation and the interpretability of fuzzy rules are discussed. Our study shows, in comparison to other algorithms used in knowledge discovery, that classifcation with a group of fuzzy rules is a useful technique for risk estimation of cardiovascular patients. © 2013 Old City Publishing, Inc
    • …
    corecore