14,094 research outputs found

    A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction

    Get PDF
    Univariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only

    An intelligent assistant for exploratory data analysis

    Get PDF
    In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm

    A vine copula mixed effect model for trivariate meta-analysis of diagnostic test accuracy studies accounting for disease prevalence

    Get PDF
    A bivariate copula mixed model has been recently proposed to synthesize diagnostic test accuracy studies and it has been shown that it is superior to the standard generalized linear mixed model in this context. Here, we call trivariate vine copulas to extend the bivariate meta-analysis of diagnostic test accuracy studies by accounting for disease prevalence. Our vine copula mixed model includes the trivariate generalized linear mixed model as a special case and can also operate on the original scale of sensitivity, specificity, and disease prevalence. Our general methodology is illustrated by re-analyzing the data of two published meta-analyses. Our study suggests that there can be an improvement on trivariate generalized linear mixed model in fit to data and makes the argument for moving to vine copula random effects models especially because of their richness, including reflection asymmetric tail dependence, and computational feasibility despite their three dimensionality

    Better safe than sorry: Risky function exploitation through safe optimization

    Get PDF
    Exploration-exploitation of functions, that is learning and optimizing a mapping between inputs and expected outputs, is ubiquitous to many real world situations. These situations sometimes require us to avoid certain outcomes at all cost, for example because they are poisonous, harmful, or otherwise dangerous. We test participants' behavior in scenarios in which they have to find the optimum of a function while at the same time avoid outputs below a certain threshold. In two experiments, we find that Safe-Optimization, a Gaussian Process-based exploration-exploitation algorithm, describes participants' behavior well and that participants seem to care firstly whether a point is safe and then try to pick the optimal point from all such safe points. This means that their trade-off between exploration and exploitation can be seen as an intelligent, approximate, and homeostasis-driven strategy.Comment: 6 pages, submitted to Cognitive Science Conferenc

    USING CONTINGENT VALUATION WITH RESPONDENT UNCERTAINTY TO ESTIMATE THE COSTS OF CLIMATE CHANGE PROGRAMS: AN APPLICATION TO CANADIAN LANDOWNERS

    Get PDF
    Using a survey of western Canadian agricultural landowners, we examine the cost and viability of two distinct afforestation options for carbon-uptake purposes. Responses to two separate, but most-likely related willingness to accept compensation questions are elicited using the contingent valuation method. Respondents then select the level of certainty with which they believe their responses were given. This paper provides a framework for estimation of the bivariate model with certainty and a modification of the model to incorporate uncertainty based on Li and Mattson's approach to preference uncertainty. While highly preliminary results are given for the bivariate model with certainty, applications of both models will be presented at the 2003 AAEA Meetings.Environmental Economics and Policy, Resource /Energy Economics and Policy,
    • 

    corecore