58 research outputs found
Cost-Sensitive Decision Trees with Completion Time Requirements
In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty
Fair Inputs and Fair Outputs: The Incompatibility of Fairness in Privacy and Accuracy
Fairness concerns about algorithmic decision-making systems have been mainly
focused on the outputs (e.g., the accuracy of a classifier across individuals
or groups). However, one may additionally be concerned with fairness in the
inputs. In this paper, we propose and formulate two properties regarding the
inputs of (features used by) a classifier. In particular, we claim that fair
privacy (whether individuals are all asked to reveal the same information) and
need-to-know (whether users are only asked for the minimal information required
for the task at hand) are desirable properties of a decision system. We explore
the interaction between these properties and fairness in the outputs (fair
prediction accuracy). We show that for an optimal classifier these three
properties are in general incompatible, and we explain what common properties
of data make them incompatible. Finally we provide an algorithm to verify if
the trade-off between the three properties exists in a given dataset, and use
the algorithm to show that this trade-off is common in real data
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Reaching a Consensus on Access Detection by a Decision System
Classification techniques based on Artificial Intelligence are computational tools that have been applied to detection of intrusions (IDS) with encouraging results. They are able to solve problems related to information security in an efficient way. The intrusion detection implies the use of huge amount of information. For this reason heuristic methodologies have been proposed. In this paper, decision trees, Naive Bayes, and supervised classifying systems UCS, are combined to improve the performance of a classifier. In order to validate the system, a scenario based on real data of the NSL-KDD99 dataset is used.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEMinistry of Higher Education, Science, Technology and Innovation (SENESCYT) of the Government of the Republic of Ecuadorpu
Cost-Sensitive Decision Tree with Multiple Resource Constraints
Resource constraints are commonly found in classification tasks. For example, there could be a budget limit on implementation and a deadline for finishing the classification task. Applying the top-down approach for tree induction in this situation may have significant drawbacks. In particular, it is difficult, especially in an early stage of tree induction, to assess an attribute’s contribution to improving the total implementation cost and its impact on attribute selection in later stages because of the deadline constraint. To address this problem, we propose an innovative algorithm, namely, the Cost-Sensitive Associative Tree (CAT) algorithm. Essentially, the algorithm first extracts and retains association classification rules from the training data which satisfy resource constraints, and then uses the rules to construct the final decision tree. The approach has advantages over the traditional top-down approach, first because only feasible classification rules are considered in the tree induction and, second, because their costs and resource use are known. In contrast, in the top-down approach, the information is not available for selecting splitting attributes. The experiment results show that the CAT algorithm significantly outperforms the top-down approach and adapts very well to available resources.Cost-sensitive learning, mining methods and algorithms, decision trees
Study of Pruning Techniques to Predict Efficient Business Decisions for a Shopping Mall
The shopping mall domain is a dynamic and unpredictable environment. Traditional techniques such as fundamental and technical analysis can provide investors with some tools for managing their shops and predicting their business growth. However, these techniques cannot discover all the possible relations between business growth and thus, there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the shopping malls and help to increase business growth. Therefore, there is a need to find a perfect solution or an algorithm to work with this kind of environment. So we are going to study few methods of pruning with decision tree. Finally, we prove and make use of the Cost based pruning method to obtain an objective evaluation of the tendency to over prune or under prune observed in each method
ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening
Breast cancer screening policies attempt to achieve timely diagnosis by the
regular screening of apparently healthy women. Various clinical decisions are
needed to manage the screening process; those include: selecting the screening
tests for a woman to take, interpreting the test outcomes, and deciding whether
or not a woman should be referred to a diagnostic test. Such decisions are
currently guided by clinical practice guidelines (CPGs), which represent a
one-size-fits-all approach that are designed to work well on average for a
population, without guaranteeing that it will work well uniformly over that
population. Since the risks and benefits of screening are functions of each
patients features, personalized screening policies that are tailored to the
features of individuals are needed in order to ensure that the right tests are
recommended to the right woman. In order to address this issue, we present
ConfidentCare: a computer-aided clinical decision support system that learns a
personalized screening policy from the electronic health record (EHR) data.
ConfidentCare operates by recognizing clusters of similar patients, and
learning the best screening policy to adopt for each cluster. A cluster of
patients is a set of patients with similar features (e.g. age, breast density,
family history, etc.), and the screening policy is a set of guidelines on what
actions to recommend for a woman given her features and screening test scores.
ConfidentCare algorithm ensures that the policy adopted for every cluster of
patients satisfies a predefined accuracy requirement with a high level of
confidence. We show that our algorithm outperforms the current CPGs in terms of
cost-efficiency and false positive rates
- …