17,797 research outputs found
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
Asymmetric Totally-corrective Boosting for Real-time Object Detection
Real-time object detection is one of the core problems in computer vision.
The cascade boosting framework proposed by Viola and Jones has become the
standard for this problem. In this framework, the learning goal for each node
is asymmetric, which is required to achieve a high detection rate and a
moderate false positive rate. We develop new boosting algorithms to address
this asymmetric learning problem. We show that our methods explicitly optimize
asymmetric loss objectives in a totally corrective fashion. The methods are
totally corrective in the sense that the coefficients of all selected weak
classifiers are updated at each iteration. In contract, conventional boosting
like AdaBoost is stage-wise in that only the current weak classifier's
coefficient is updated. At the heart of the totally corrective boosting is the
column generation technique. Experiments on face detection show that our
methods outperform the state-of-the-art asymmetric boosting methods.Comment: 14 pages, published in Asian Conf. Computer Vision 201
ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks
Hash codes are efficient data representations for coping with the ever
growing amounts of data. In this paper, we introduce a random forest semantic
hashing scheme that embeds tiny convolutional neural networks (CNN) into
shallow random forests, with near-optimal information-theoretic code
aggregation among trees. We start with a simple hashing scheme, where random
trees in a forest act as hashing functions by setting `1' for the visited tree
leaf, and `0' for the rest. We show that traditional random forests fail to
generate hashes that preserve the underlying similarity between the trees,
rendering the random forests approach to hashing challenging. To address this,
we propose to first randomly group arriving classes at each tree split node
into two groups, obtaining a significantly simplified two-class classification
problem, which can be handled using a light-weight CNN weak learner. Such
random class grouping scheme enables code uniqueness by enforcing each class to
share its code with different classes in different trees. A non-conventional
low-rank loss is further adopted for the CNN weak learners to encourage code
consistency by minimizing intra-class variations and maximizing inter-class
distance for the two random class groups. Finally, we introduce an
information-theoretic approach for aggregating codes of individual trees into a
single hash code, producing a near-optimal unique hash for each class. The
proposed approach significantly outperforms state-of-the-art hashing methods
for image retrieval tasks on large-scale public datasets, while performing at
the level of other state-of-the-art image classification techniques while
utilizing a more compact and efficient scalable representation. This work
proposes a principled and robust procedure to train and deploy in parallel an
ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201
On multi-class learning through the minimization of the confusion matrix norm
In imbalanced multi-class classification problems, the misclassification rate
as an error measure may not be a relevant choice. Several methods have been
developed where the performance measure retained richer information than the
mere misclassification rate: misclassification costs, ROC-based information,
etc. Following this idea of dealing with alternate measures of performance, we
propose to address imbalanced classification problems by using a new measure to
be optimized: the norm of the confusion matrix. Indeed, recent results show
that using the norm of the confusion matrix as an error measure can be quite
interesting due to the fine-grain informations contained in the matrix,
especially in the case of imbalanced classes. Our first contribution then
consists in showing that optimizing criterion based on the confusion matrix
gives rise to a common background for cost-sensitive methods aimed at dealing
with imbalanced classes learning problems. As our second contribution, we
propose an extension of a recent multi-class boosting method --- namely
AdaBoost.MM --- to the imbalanced class problem, by greedily minimizing the
empirical norm of the confusion matrix. A theoretical analysis of the
properties of the proposed method is presented, while experimental results
illustrate the behavior of the algorithm and show the relevancy of the approach
compared to other methods
- …