1,156 research outputs found
Asymmetric Pruning for Learning Cascade Detectors
Cascade classifiers are one of the most important contributions to real-time
object detection. Nonetheless, there are many challenging problems arising in
training cascade detectors. One common issue is that the node classifier is
trained with a symmetric classifier. Having a low misclassification error rate
does not guarantee an optimal node learning goal in cascade classifiers, i.e.,
an extremely high detection rate with a moderate false positive rate. In this
work, we present a new approach to train an effective node classifier in a
cascade detector. The algorithm is based on two key observations: 1) Redundant
weak classifiers can be safely discarded; 2) The final detector should satisfy
the asymmetric learning objective of the cascade architecture. To achieve this,
we separate the classifier training into two steps: finding a pool of
discriminative weak classifiers/features and training the final classifier by
pruning weak classifiers which contribute little to the asymmetric learning
criterion (asymmetric classifier construction). Our model reduction approach
helps accelerate the learning time while achieving the pre-determined learning
objective. Experimental results on both face and car data sets verify the
effectiveness of the proposed algorithm. On the FDDB face data sets, our
approach achieves the state-of-the-art performance, which demonstrates the
advantage of our approach.Comment: 14 page
RandomBoost: Simplified Multi-class Boosting through Randomization
We propose a novel boosting approach to multi-class classification problems,
in which multiple classes are distinguished by a set of random projection
matrices in essence. The approach uses random projections to alleviate the
proliferation of binary classifiers typically required to perform multi-class
classification. The result is a multi-class classifier with a single
vector-valued parameter, irrespective of the number of classes involved. Two
variants of this approach are proposed. The first method randomly projects the
original data into new spaces, while the second method randomly projects the
outputs of learned weak classifiers. These methods are not only conceptually
simple but also effective and easy to implement. A series of experiments on
synthetic, machine learning and visual recognition data sets demonstrate that
our proposed methods compare favorably to existing multi-class boosting
algorithms in terms of both the convergence rate and classification accuracy.Comment: 15 page
Algorithm selection on data streams
We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability
Learned versus Hand-Designed Feature Representations for 3d Agglomeration
For image recognition and labeling tasks, recent results suggest that machine
learning methods that rely on manually specified feature representations may be
outperformed by methods that automatically derive feature representations based
on the data. Yet for problems that involve analysis of 3d objects, such as mesh
segmentation, shape retrieval, or neuron fragment agglomeration, there remains
a strong reliance on hand-designed feature descriptors. In this paper, we
evaluate a large set of hand-designed 3d feature descriptors alongside features
learned from the raw data using both end-to-end and unsupervised learning
techniques, in the context of agglomeration of 3d neuron fragments. By
combining unsupervised learning techniques with a novel dynamic pooling scheme,
we show how pure learning-based methods are for the first time competitive with
hand-designed 3d shape descriptors. We investigate data augmentation strategies
for dramatically increasing the size of the training set, and show how
combining both learned and hand-designed features leads to the highest
accuracy
Generalized additive modelling with implicit variable selection by likelihood based boosting
The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. As weak learners penalized regression splines and the newly introduced penalized stumps are considered. Estimates of standard deviations and stopping criteria which are notorious problems in iterative procedures are based on an approximate hat matrix. The method is shown to outperform common procedures for the fitting of generalized additive models. In particular in high dimensional settings it is the only method that works properly
An Effective Model Of Autism Spectrum Disorder Using Machine Learning
Autism spectrum disorder (ASD) is one of the most common diseases that affect human nerves and cause a decrease in the intelligence and comprehension of the person. This disease is a group of various disorders that are characterized by poor social behavior and communication. It affects all age groups, including adults, adolescents, children, and the elderly, but the symptoms of this disease always appear in their early years. ASD suffer from problems, the most important of which are data loss, low quality, and extreme values. This makes the process of diagnosing the ASD early. Our goals in this research is to solve the ASD problems. The cussent authors proposed a technical model that solves all data problems. We used ensemble techniques that include Bayesian Boosting, Classification by Regression, Polynomial by Binominal Classification. We also used classification techniques that include CHAID, Decision Stump, Decision Tree (Weight-Based), Gradient Boosted Trees, ID3. It is proven that the proposed model solves data problems, and has obtained the highest search accuracy that has reached 100% as well as we have obtained the highest f1 measurement that has reached 100%. This proves that our work is superior to its peers
Adabook and Multibook: adaptive boosting with chance correction
There has been considerable interest in boosting and bagging, including the combination of the adaptive
techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same
time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa,
Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning
algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a
weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and
when this happens the booster can give up too early. This phenomenon is known to occur with conventional
Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems
using restart techniques based on bagging. This paper thus complements the theoretical work showing the
necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a
chance-corrected measure can improve boosting. We show that the early surrender problem occurs in
MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard
Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when
- …