38,355 research outputs found

    A survey of cost-sensitive decision tree induction algorithms

    Get PDF
    The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

    Technical note: Bias and the quantification of stability

    Get PDF
    Research on bias in machine learning algorithms has generally been concerned with the impact of bias on predictive accuracy. We believe that there are other factors that should also play a role in the evaluation of bias. One such factor is the stability of the algorithm; in other words, the repeatability of the results. If we obtain two sets of data from the same phenomenon, with the same underlying probability distribution, then we would like our learning algorithm to induce approximately the same concepts from both sets of data. This paper introduces a method for quantifying stability, based on a measure of the agreement between concepts. We also discuss the relationships among stability, predictive accuracy, and bias

    A System for Induction of Oblique Decision Trees

    Full text link
    This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Practical feature subset selection for machine learning

    Get PDF
    Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithm

    Fourier sparsity, spectral norm, and the Log-rank conjecture

    Full text link
    We study Boolean functions with sparse Fourier coefficients or small spectral norm, and show their applications to the Log-rank Conjecture for XOR functions f(x\oplus y) --- a fairly large class of functions including well studied ones such as Equality and Hamming Distance. The rank of the communication matrix M_f for such functions is exactly the Fourier sparsity of f. Let d be the F2-degree of f and D^CC(f) stand for the deterministic communication complexity for f(x\oplus y). We show that 1. D^CC(f) = O(2^{d^2/2} log^{d-2} ||\hat f||_1). In particular, the Log-rank conjecture holds for XOR functions with constant F2-degree. 2. D^CC(f) = O(d ||\hat f||_1) = O(\sqrt{rank(M_f)}\logrank(M_f)). We obtain our results through a degree-reduction protocol based on a variant of polynomial rank, and actually conjecture that its communication cost is already \log^{O(1)}rank(M_f). The above bounds also hold for the parity decision tree complexity of f, a measure that is no less than the communication complexity (up to a factor of 2). Along the way we also show several structural results about Boolean functions with small F2-degree or small spectral norm, which could be of independent interest. For functions f with constant F2-degree: 1) f can be written as the summation of quasi-polynomially many indicator functions of subspaces with \pm-signs, improving the previous doubly exponential upper bound by Green and Sanders; 2) being sparse in Fourier domain is polynomially equivalent to having a small parity decision tree complexity; 3) f depends only on polylog||\hat f||_1 linear functions of input variables. For functions f with small spectral norm: 1) there is an affine subspace with co-dimension O(||\hat f||_1) on which f is a constant; 2) there is a parity decision tree with depth O(||\hat f||_1 log ||\hat f||_0).Comment: v2: Corollary 31 of v1 removed because of a bug in the proof. (Other results not affected.

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming
    • 

    corecore