38,355 research outputs found
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
Technical note: Bias and the quantification of stability
Research on bias in machine learning algorithms has generally been concerned with the
impact of bias on predictive accuracy. We believe that there are other factors that should
also play a role in the evaluation of bias. One such factor is the stability of the algorithm;
in other words, the repeatability of the results. If we obtain two sets of data from the same
phenomenon, with the same underlying probability distribution, then we would like our
learning algorithm to induce approximately the same concepts from both sets of data. This
paper introduces a method for quantifying stability, based on a measure of the agreement
between concepts. We also discuss the relationships among stability, predictive accuracy,
and bias
A System for Induction of Oblique Decision Trees
This article describes a new system for induction of oblique decision trees.
This system, OC1, combines deterministic hill-climbing with two forms of
randomization to find a good oblique split (in the form of a hyperplane) at
each node of a decision tree. Oblique decision tree methods are tuned
especially for domains in which the attributes are numeric, although they can
be adapted to symbolic or mixed symbolic/numeric attributes. We present
extensive empirical studies, using both real and artificial data, that analyze
OC1's ability to construct oblique trees that are smaller and more accurate
than their axis-parallel counterparts. We also examine the benefits of
randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Practical feature subset selection for machine learning
Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the âgoodnessâ of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithm
Fourier sparsity, spectral norm, and the Log-rank conjecture
We study Boolean functions with sparse Fourier coefficients or small spectral
norm, and show their applications to the Log-rank Conjecture for XOR functions
f(x\oplus y) --- a fairly large class of functions including well studied ones
such as Equality and Hamming Distance. The rank of the communication matrix M_f
for such functions is exactly the Fourier sparsity of f. Let d be the F2-degree
of f and D^CC(f) stand for the deterministic communication complexity for
f(x\oplus y). We show that 1. D^CC(f) = O(2^{d^2/2} log^{d-2} ||\hat f||_1). In
particular, the Log-rank conjecture holds for XOR functions with constant
F2-degree. 2. D^CC(f) = O(d ||\hat f||_1) = O(\sqrt{rank(M_f)}\logrank(M_f)).
We obtain our results through a degree-reduction protocol based on a variant of
polynomial rank, and actually conjecture that its communication cost is already
\log^{O(1)}rank(M_f). The above bounds also hold for the parity decision tree
complexity of f, a measure that is no less than the communication complexity
(up to a factor of 2).
Along the way we also show several structural results about Boolean functions
with small F2-degree or small spectral norm, which could be of independent
interest. For functions f with constant F2-degree: 1) f can be written as the
summation of quasi-polynomially many indicator functions of subspaces with
\pm-signs, improving the previous doubly exponential upper bound by Green and
Sanders; 2) being sparse in Fourier domain is polynomially equivalent to having
a small parity decision tree complexity; 3) f depends only on polylog||\hat
f||_1 linear functions of input variables. For functions f with small spectral
norm: 1) there is an affine subspace with co-dimension O(||\hat f||_1) on which
f is a constant; 2) there is a parity decision tree with depth O(||\hat f||_1
log ||\hat f||_0).Comment: v2: Corollary 31 of v1 removed because of a bug in the proof. (Other
results not affected.
Inducing safer oblique trees without costs
Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the
distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification.
Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety.
This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming
- âŠ