180,518 research outputs found

    Ant Colony Optimization with Three Stages for Independent Test Cost Attribute Reduction

    Get PDF
    Minimal test cost attribute reduction is an important problem in cost-sensitive learning. Recently, heuristic algorithms including the information gain-based algorithm and the genetic algorithm have been designed for this problem. However, in many cases these algorithms cannot find the optimal solution. In this paper, we develop an ant colony optimization algorithm to tackle this problem. The attribute set is represented as a graph with each vertex corresponding to an attribute and weight of each edge to pheromone. Our algorithm contains three stages, namely, the addition stage, the deletion stage, and the filtration stage. In the addition stage, each ant starts from the initial position and traverses edges probabilistically until the stopping criterion is satisfied. The pheromone of the traveled path is also updated in this process. In the deletion stage, each ant deletes redundant attributes. Two strategies, called the centralized deletion strategy and the distributed deletion strategy, are proposed. Finally, the ant with minimal test cost is selected to construct the reduct in the filtration stage. Experimental results on UCI datasets indicate that the algorithm is significantly better than the information gain-based one. It also outperforms the genetic algorithm on medium-sized dataset Mushroom

    A survey of cost-sensitive decision tree induction algorithms

    Get PDF
    The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

    Cost-Sensitive Decision Trees with Completion Time Requirements

    Get PDF
    In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty

    Identifying Taste Variation in Choice Models

    No full text
    Among the many attractive features of the mixed logit model is its ability to take account of taste variation among decision-makers by allowing coefficients to follow pre-specified distributions (usually normal or lognormal). Whilst accounting for heterogeneity in the population, simple applications of the technique fail to identify valuable information on differences in behaviour between market segments. This information is likely to be of use to those involved in policy and investment analysis, product design and marketing. The ‘standard’ approach to overcome this problem when working with the mixed logit model is to identify segments prior to modelling and either specify a set of constant coefficients for each market segment together with an additional error term to ‘mop-up’ any residual variation, or by allowing separate distributions for each market segment. An alternative approach is to adapt an exciting new methodology that offers the ability to estimate reliable individual specific parameters (Revelt and Train, 1999). This approach is documented in Section 3 and involves three key stages: • First use maximum simulated likelihood to estimate distributions of tastes across the population. • Next examine individual’s choices to arrive at estimates of their parameters, conditional on know distributions across the population (including accounting for uncertainty in the population estimates). This process again involves the use of maximum simulated likelihood. • Finally, differences in behaviour between market segments are identified by regressing individual ‘part-worths’ against the characteristics of the decision-maker or attributes of the choice alternatives. In the first instance the technique is validated under ‘controlled’ circumstances on a simulated data set with know taste distributions. This simulation involves a binary choice situation in which the alternatives are described in terms of time and cost. The choices of a group of decision-makers are simulated with each with a value of time drawn from a known distribution. The resulting choices are then analysed and individual values recovered with a surprisingly high degree of precision. The findings of this validation are set out in Section 4. Following a successful validation of the technique on simulated data, the methodology is applied to data from two stated preference experiments in which 326 respondents were asked to choose between alternate motor vehicle 1 specifications defined by purchase price, running costs, engine size, emissions and safety features. The results of this analysis are set out in Section 5 and are compared to the findings of previously calibrated models that identified significant differences in tastes across market segments

    Creating Fair Models of Atherosclerotic Cardiovascular Disease Risk

    Get PDF
    Guidelines for the management of atherosclerotic cardiovascular disease (ASCVD) recommend the use of risk stratification models to identify patients most likely to benefit from cholesterol-lowering and other therapies. These models have differential performance across race and gender groups with inconsistent behavior across studies, potentially resulting in an inequitable distribution of beneficial therapy. In this work, we leverage adversarial learning and a large observational cohort extracted from electronic health records (EHRs) to develop a "fair" ASCVD risk prediction model with reduced variability in error rates across groups. We empirically demonstrate that our approach is capable of aligning the distribution of risk predictions conditioned on the outcome across several groups simultaneously for models built from high-dimensional EHR data. We also discuss the relevance of these results in the context of the empirical trade-off between fairness and model performance

    Cost-Sensitive Decision Tree with Multiple Resource Constraints

    Get PDF
    Resource constraints are commonly found in classification tasks. For example, there could be a budget limit on implementation and a deadline for finishing the classification task. Applying the top-down approach for tree induction in this situation may have significant drawbacks. In particular, it is difficult, especially in an early stage of tree induction, to assess an attribute’s contribution to improving the total implementation cost and its impact on attribute selection in later stages because of the deadline constraint. To address this problem, we propose an innovative algorithm, namely, the Cost-Sensitive Associative Tree (CAT) algorithm. Essentially, the algorithm first extracts and retains association classification rules from the training data which satisfy resource constraints, and then uses the rules to construct the final decision tree. The approach has advantages over the traditional top-down approach, first because only feasible classification rules are considered in the tree induction and, second, because their costs and resource use are known. In contrast, in the top-down approach, the information is not available for selecting splitting attributes. The experiment results show that the CAT algorithm significantly outperforms the top-down approach and adapts very well to available resources.Cost-sensitive learning, mining methods and algorithms, decision trees
    corecore