Search CORE

21,759 research outputs found

Cost-Sensitive Decision Tree with Multiple Resource Constraints

Author: Chia-Chi WU
Kwei TANG
Yen-Liang CHEN
Publication venue
Publication date
Field of study

Resource constraints are commonly found in classification tasks. For example, there could be a budget limit on implementation and a deadline for finishing the classification task. Applying the top-down approach for tree induction in this situation may have significant drawbacks. In particular, it is difficult, especially in an early stage of tree induction, to assess an attribute’s contribution to improving the total implementation cost and its impact on attribute selection in later stages because of the deadline constraint. To address this problem, we propose an innovative algorithm, namely, the Cost-Sensitive Associative Tree (CAT) algorithm. Essentially, the algorithm first extracts and retains association classification rules from the training data which satisfy resource constraints, and then uses the rules to construct the final decision tree. The approach has advantages over the traditional top-down approach, first because only feasible classification rules are considered in the tree induction and, second, because their costs and resource use are known. In contrast, in the top-down approach, the information is not available for selecting splitting attributes. The experiment results show that the CAT algorithm significantly outperforms the top-down approach and adapts very well to available resources.Cost-sensitive learning, mining methods and algorithms, decision trees

Research Papers in Economics

Cost-Sensitive Decision Trees with Completion Time Requirements

Author: Hung-Pin KAO
Jen TANG
Kwei TANG
Publication venue
Publication date
Field of study

In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty

Research Papers in Economics

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives

Author: Chatterjee Krishnendu
Elgyütt Adrián
Novotný Petr
Rouillé Owen
Publication venue
Publication date: 01/01/2018
Field of study

Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk-averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows obtaining risk-averse policies, but ignores optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem, where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.Comment: Full version of a paper published at IJCAI/ECAI 201

arXiv.org e-Print Archive

Crossref

IST Austria: PubRep (Institute of Science and Technology)

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

A General Framework for Fair Regression

Author: Ali AbdulRahman Al
Fitzsimons Jack
Osborne Michael
Roberts Stephen
Publication venue: 'MDPI AG'
Publication date: 02/02/2019
Field of study

Fairness, through its many forms and definitions, has become an important issue facing the machine learning community. In this work, we consider how to incorporate group fairness constraints in kernel regression methods, applicable to Gaussian processes, support vector machines, neural network regression and decision tree regression. Further, we focus on examining the effect of incorporating these constraints in decision tree regression, with direct applications to random forests and boosted trees amongst other widespread popular inference techniques. We show that the order of complexity of memory and computation is preserved for such models and tightly bound the expected perturbations to the model in terms of the number of leaves of the trees. Importantly, the approach works on trained models and hence can be easily applied to models in current use and group labels are only required on training data.Comment: 8 pages, 4 figures, 2 pages reference

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Oxford University Research Archive

A methodology for the generation of efficient error detection mechanisms

Author: Anand Sarabjot Singh
Arif Saima
Jhumka Arshad
Leeke Matthew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2011
Field of study

A dependable software system must contain error detection mechanisms and error recovery mechanisms. Software components for the detection of errors are typically designed based on a system specification or the experience of software engineers, with their efficiency typically being measured using fault injection and metrics such as coverage and latency. In this paper, we introduce a methodology for the design of highly efficient error detection mechanisms. The proposed methodology combines fault injection analysis and data mining techniques in order to generate predicates for efficient error detection mechanisms. The results presented demonstrate the viability of the methodology as an approach for the development of efficient error detection mechanisms, as the predicates generated yield a true positive rate of almost 100% and a false positive rate very close to 0% for the detection of failure-inducing states. The main advantage of the proposed methodology over current state-of-the-art approaches is that efficient detectors are obtained by design, rather than by using specification-based detector design or the experience of software engineers

Warwick Research Archives Portal Repository

Recommended from our members

A survey of induction algorithms for machine learning

Author: Porter Bruce W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1983
Field of study

Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented

eScholarship - University of California

ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening

Author: Alaa Ahmed M.
Hsu William
Moon Kyeong H.
van der Schaar Mihaela
Publication venue
Publication date: 01/01/2016
Field of study

Breast cancer screening policies attempt to achieve timely diagnosis by the regular screening of apparently healthy women. Various clinical decisions are needed to manage the screening process; those include: selecting the screening tests for a woman to take, interpreting the test outcomes, and deciding whether or not a woman should be referred to a diagnostic test. Such decisions are currently guided by clinical practice guidelines (CPGs), which represent a one-size-fits-all approach that are designed to work well on average for a population, without guaranteeing that it will work well uniformly over that population. Since the risks and benefits of screening are functions of each patients features, personalized screening policies that are tailored to the features of individuals are needed in order to ensure that the right tests are recommended to the right woman. In order to address this issue, we present ConfidentCare: a computer-aided clinical decision support system that learns a personalized screening policy from the electronic health record (EHR) data. ConfidentCare operates by recognizing clusters of similar patients, and learning the best screening policy to adopt for each cluster. A cluster of patients is a set of patients with similar features (e.g. age, breast density, family history, etc.), and the screening policy is a set of guidelines on what actions to recommend for a woman given her features and screening test scores. ConfidentCare algorithm ensures that the policy adopted for every cluster of patients satisfies a predefined accuracy requirement with a high level of confidence. We show that our algorithm outperforms the current CPGs in terms of cost-efficiency and false positive rates

arXiv.org e-Print Archive

Crossref

eScholarship - University of California