1,165 research outputs found
Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm
This paper introduces ICET, a new algorithm for cost-sensitive
classification. ICET uses a genetic algorithm to evolve a population of biases
for a decision tree induction algorithm. The fitness function of the genetic
algorithm is the average cost of classification when using the decision tree,
including both the costs of tests (features, measurements) and the costs of
classification errors. ICET is compared here with three other algorithms for
cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5,
which classifies without regard to cost. The five algorithms are evaluated
empirically on five real-world medical datasets. Three sets of experiments are
performed. The first set examines the baseline performance of the five
algorithms on the five datasets and establishes that ICET performs
significantly better than its competitors. The second set tests the robustness
of ICET under a variety of conditions and shows that ICET maintains its
advantage. The third set looks at ICET's search in bias space and discovers a
way to improve the search.Comment: See http://www.jair.org/ for any accompanying file
Inducing safer oblique trees without costs
Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the
distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification.
Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety.
This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming
Integrating Learning from Examples into the Search for Diagnostic Policies
This paper studies the problem of learning diagnostic policies from training
examples. A diagnostic policy is a complete description of the decision-making
actions of a diagnostician (i.e., tests followed by a diagnostic decision) for
all possible combinations of test results. An optimal diagnostic policy is one
that minimizes the expected total cost, which is the sum of measurement costs
and misdiagnosis costs. In most diagnostic settings, there is a tradeoff
between these two kinds of costs. This paper formalizes diagnostic decision
making as a Markov Decision Process (MDP). The paper introduces a new family of
systematic search algorithms based on the AO* algorithm to solve this MDP. To
make AO* efficient, the paper describes an admissible heuristic that enables
AO* to prune large parts of the search space. The paper also introduces several
greedy algorithms including some improvements over previously-published
methods. The paper then addresses the question of learning diagnostic policies
from examples. When the probabilities of diseases and test results are computed
from training data, there is a great danger of overfitting. To reduce
overfitting, regularizers are integrated into the search algorithms. Finally,
the paper compares the proposed methods on five benchmark diagnostic data sets.
The studies show that in most cases the systematic search methods produce
better diagnostic policies than the greedy methods. In addition, the studies
show that for training sets of realistic size, the systematic search algorithms
are practical on todays desktop computers
An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders
The data mining along with emerging computing techniques have astonishingly
influenced the healthcare industry. Researchers have used different Data Mining
and Internet of Things (IoT) for enrooting a programmed solution for diabetes
and heart patients. However, still, more advanced and united solution is needed
that can offer a therapeutic opinion to individual diabetic and cardio
patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced
healthcare system for proficient diabetes and cardiovascular diseases have been
proposed. The hybridization of data mining and IoT with other emerging
computing techniques is supposed to give an effective and economical solution
to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining,
Internet of Things, chatbots, contextual entity search (CES), bio-sensors,
semantic analysis and granular computing (GC). The bio-sensors of the proposed
system assist in getting the current and precise status of the concerned
patients so that in case of an emergency, the needful medical assistance can be
provided. The novelty lies in the hybrid framework and the adequate support of
chatbots, granular computing, context entity search and semantic analysis. The
practical implementation of this system is very challenging and costly.
However, it appears to be more operative and economical solution for diabetes
and cardio patients.Comment: 11 PAGE
Fuzzy rule-based system applied to risk estimation of cardiovascular patients
Cardiovascular decision support is one area of increasing research interest. On-going collaborations between clinicians and computer scientists are looking at the application of knowledge discovery in databases to the area of patient diagnosis, based on clinical records. A fuzzy rule-based system for risk estimation of cardiovascular patients is proposed. It uses a group of fuzzy rules as a knowledge representation about data pertaining to cardiovascular patients. Several algorithms for the discovery of an easily readable and understandable group of fuzzy rules are formalized and analysed. The accuracy of risk estimation and the interpretability of fuzzy rules are discussed. Our study shows, in comparison to other algorithms used in knowledge discovery, that classifcation with a group of fuzzy rules is a useful technique for risk estimation of cardiovascular patients. © 2013 Old City Publishing, Inc
- …