1,156 research outputs found

    Asymmetric Pruning for Learning Cascade Detectors

    Full text link
    Cascade classifiers are one of the most important contributions to real-time object detection. Nonetheless, there are many challenging problems arising in training cascade detectors. One common issue is that the node classifier is trained with a symmetric classifier. Having a low misclassification error rate does not guarantee an optimal node learning goal in cascade classifiers, i.e., an extremely high detection rate with a moderate false positive rate. In this work, we present a new approach to train an effective node classifier in a cascade detector. The algorithm is based on two key observations: 1) Redundant weak classifiers can be safely discarded; 2) The final detector should satisfy the asymmetric learning objective of the cascade architecture. To achieve this, we separate the classifier training into two steps: finding a pool of discriminative weak classifiers/features and training the final classifier by pruning weak classifiers which contribute little to the asymmetric learning criterion (asymmetric classifier construction). Our model reduction approach helps accelerate the learning time while achieving the pre-determined learning objective. Experimental results on both face and car data sets verify the effectiveness of the proposed algorithm. On the FDDB face data sets, our approach achieves the state-of-the-art performance, which demonstrates the advantage of our approach.Comment: 14 page

    RandomBoost: Simplified Multi-class Boosting through Randomization

    Full text link
    We propose a novel boosting approach to multi-class classification problems, in which multiple classes are distinguished by a set of random projection matrices in essence. The approach uses random projections to alleviate the proliferation of binary classifiers typically required to perform multi-class classification. The result is a multi-class classifier with a single vector-valued parameter, irrespective of the number of classes involved. Two variants of this approach are proposed. The first method randomly projects the original data into new spaces, while the second method randomly projects the outputs of learned weak classifiers. These methods are not only conceptually simple but also effective and easy to implement. A series of experiments on synthetic, machine learning and visual recognition data sets demonstrate that our proposed methods compare favorably to existing multi-class boosting algorithms in terms of both the convergence rate and classification accuracy.Comment: 15 page

    Algorithm selection on data streams

    Get PDF
    We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability

    Learned versus Hand-Designed Feature Representations for 3d Agglomeration

    Full text link
    For image recognition and labeling tasks, recent results suggest that machine learning methods that rely on manually specified feature representations may be outperformed by methods that automatically derive feature representations based on the data. Yet for problems that involve analysis of 3d objects, such as mesh segmentation, shape retrieval, or neuron fragment agglomeration, there remains a strong reliance on hand-designed feature descriptors. In this paper, we evaluate a large set of hand-designed 3d feature descriptors alongside features learned from the raw data using both end-to-end and unsupervised learning techniques, in the context of agglomeration of 3d neuron fragments. By combining unsupervised learning techniques with a novel dynamic pooling scheme, we show how pure learning-based methods are for the first time competitive with hand-designed 3d shape descriptors. We investigate data augmentation strategies for dramatically increasing the size of the training set, and show how combining both learned and hand-designed features leads to the highest accuracy

    Generalized additive modelling with implicit variable selection by likelihood based boosting

    Get PDF
    The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. As weak learners penalized regression splines and the newly introduced penalized stumps are considered. Estimates of standard deviations and stopping criteria which are notorious problems in iterative procedures are based on an approximate hat matrix. The method is shown to outperform common procedures for the fitting of generalized additive models. In particular in high dimensional settings it is the only method that works properly

    An Effective Model Of Autism Spectrum Disorder Using Machine Learning

    Get PDF
    Autism spectrum disorder (ASD) is one of the most common diseases that affect human nerves and cause a decrease in the intelligence and comprehension of the person. This disease is a group of various disorders that are characterized by poor social behavior and communication. It affects all age groups, including adults, adolescents, children, and the elderly, but the symptoms of this disease always appear in their early years. ASD suffer from problems, the most important of which are data loss, low quality, and extreme values. This makes the process of diagnosing the ASD early. Our goals in this research is to solve the ASD problems. The cussent authors proposed a technical model that solves all data problems. We used ensemble techniques that include Bayesian Boosting, Classification by Regression, Polynomial by Binominal Classification. We also used classification techniques that include CHAID, Decision Stump, Decision Tree (Weight-Based), Gradient Boosted Trees, ID3. It is proven that the proposed model solves data problems, and has obtained the highest search accuracy that has reached 100% as well as we have obtained the highest f1 measurement that has reached 100%. This proves that our work is superior to its peers

    Adabook and Multibook: adaptive boosting with chance correction

    Get PDF
    There has been considerable interest in boosting and bagging, including the combination of the adaptive techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa, Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and when this happens the booster can give up too early. This phenomenon is known to occur with conventional Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems using restart techniques based on bagging. This paper thus complements the theoretical work showing the necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a chance-corrected measure can improve boosting. We show that the early surrender problem occurs in MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when
    corecore