6,658 research outputs found
On PAC-Bayesian Bounds for Random Forests
Existing guarantees in terms of rigorous upper bounds on the generalization
error for the original random forest algorithm, one of the most frequently used
machine learning methods, are unsatisfying. We discuss and evaluate various
PAC-Bayesian approaches to derive such bounds. The bounds do not require
additional hold-out data, because the out-of-bag samples from the bagging in
the training process can be exploited. A random forest predicts by taking a
majority vote of an ensemble of decision trees. The first approach is to bound
the error of the vote by twice the error of the corresponding Gibbs classifier
(classifying with a single member of the ensemble selected at random). However,
this approach does not take into account the effect of averaging out of errors
of individual classifiers when taking the majority vote. This effect provides a
significant boost in performance when the errors are independent or negatively
correlated, but when the correlations are strong the advantage from taking the
majority vote is small. The second approach based on PAC-Bayesian C-bounds
takes dependencies between ensemble members into account, but it requires
estimating correlations between the errors of the individual classifiers. When
the correlations are high or the estimation is poor, the bounds degrade. In our
experiments, we compute generalization bounds for random forests on various
benchmark data sets. Because the individual decision trees already perform
well, their predictions are highly correlated and the C-bounds do not lead to
satisfactory results. For the same reason, the bounds based on the analysis of
Gibbs classifiers are typically superior and often reasonably tight. Bounds
based on a validation set coming at the cost of a smaller training set gave
better performance guarantees, but worse performance in most experiments
Reducing the Effects of Detrimental Instances
Not all instances in a data set are equally beneficial for inducing a model
of the data. Some instances (such as outliers or noise) can be detrimental.
However, at least initially, the instances in a data set are generally
considered equally in machine learning algorithms. Many current approaches for
handling noisy and detrimental instances make a binary decision about whether
an instance is detrimental or not. In this paper, we 1) extend this paradigm by
weighting the instances on a continuous scale and 2) present a methodology for
measuring how detrimental an instance may be for inducing a model of the data.
We call our method of identifying and weighting detrimental instances reduced
detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets
and 5 learning algorithms and compare RIDL with other weighting and filtering
approaches. RDIL is especially useful for learning algorithms where every
instance can affect the classification boundary and the training instances are
considered individually, such as multilayer perceptrons trained with
backpropagation (MLPs). Our results also suggest that a more accurate estimate
of which instances are detrimental can have a significant positive impact for
handling them.Comment: 6 pages, 5 tables, 2 figures. arXiv admin note: substantial text
overlap with arXiv:1403.189
Localized Regression
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen dataÂĄadaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
Studies of Boosted Decision Trees for MiniBooNE Particle Identification
Boosted decision trees are applied to particle identification in the
MiniBooNE experiment operated at Fermi National Accelerator Laboratory
(Fermilab) for neutrino oscillations. Numerous attempts are made to tune the
boosted decision trees, to compare performance of various boosting algorithms,
and to select input variables for optimal performance.Comment: 28 pages, 22 figures, submitted to Nucl. Inst & Meth.
Object Recognition from very few Training Examples for Enhancing Bicycle Maps
In recent years, data-driven methods have shown great success for extracting
information about the infrastructure in urban areas. These algorithms are
usually trained on large datasets consisting of thousands or millions of
labeled training examples. While large datasets have been published regarding
cars, for cyclists very few labeled data is available although appearance,
point of view, and positioning of even relevant objects differ. Unfortunately,
labeling data is costly and requires a huge amount of work. In this paper, we
thus address the problem of learning with very few labels. The aim is to
recognize particular traffic signs in crowdsourced data to collect information
which is of interest to cyclists. We propose a system for object recognition
that is trained with only 15 examples per class on average. To achieve this, we
combine the advantages of convolutional neural networks and random forests to
learn a patch-wise classifier. In the next step, we map the random forest to a
neural network and transform the classifier to a fully convolutional network.
Thereby, the processing of full images is significantly accelerated and
bounding boxes can be predicted. Finally, we integrate data of the Global
Positioning System (GPS) to localize the predictions on the map. In comparison
to Faster R-CNN and other networks for object recognition or algorithms for
transfer learning, we considerably reduce the required amount of labeled data.
We demonstrate good performance on the recognition of traffic signs for
cyclists as well as their localization in maps.Comment: Submitted to IV 2018. This research was supported by German Research
Foundation DFG within Priority Research Programme 1894 "Volunteered
Geographic Information: Interpretation, Visualization and Social Computing
Group-level Emotion Recognition using Transfer Learning from Face Identification
In this paper, we describe our algorithmic approach, which was used for
submissions in the fifth Emotion Recognition in the Wild (EmotiW 2017)
group-level emotion recognition sub-challenge. We extracted feature vectors of
detected faces using the Convolutional Neural Network trained for face
identification task, rather than traditional pre-training on emotion
recognition problems. In the final pipeline an ensemble of Random Forest
classifiers was learned to predict emotion score using available training set.
In case when the faces have not been detected, one member of our ensemble
extracts features from the whole image. During our experimental study, the
proposed approach showed the lowest error rate when compared to other explored
techniques. In particular, we achieved 75.4% accuracy on the validation data,
which is 20% higher than the handcrafted feature-based baseline. The source
code using Keras framework is publicly available.Comment: 5 pages, 3 figures, accepted for publication at ICMI17 (EmotiW Grand
Challenge
- âŠ