916 research outputs found
Applying Rule Ensembles to the Search for Super-Symmetry at the Large Hadron Collider
In this note we give an example application of a recently presented
predictive learning method called Rule Ensembles. The application we present is
the search for super-symmetric particles at the Large Hadron Collider. In
particular, we consider the problem of separating the background coming from
top quark production from the signal of super-symmetric particles. The method
is based on an expansion of base learners, each learner being a rule, i.e. a
combination of cuts in the variable space describing signal and background.
These rules are generated from an ensemble of decision trees. One of the
results of the method is a set of rules (cuts) ordered according to their
importance, which gives useful tools for diagnosis of the model. We also
compare the method to a number of other multivariate methods, in particular
Artificial Neural Networks, the likelihood method and the recently presented
boosted decision tree method. We find better performance of Rule Ensembles in
all cases. For example for a given significance the amount of data needed to
claim SUSY discovery could be reduced by 15 % using Rule Ensembles as compared
to using a likelihood method.Comment: 24 pages, 7 figures, replaced to match version accepted for
publication in JHE
Bagging ensemble selection for regression
Bagging ensemble selection (BES) is a relatively new ensemble learning strategy. The strategy can be seen as an ensemble of the ensemble selection from libraries of models (ES) strategy. Previous experimental results on binary classification problems have shown that using random trees as base classifiers, BES-OOB (the most successful variant of BES) is competitive with (and in many cases, superior to) other ensemble learning strategies, for instance, the original ES algorithm, stacking with linear regression, random forests or boosting. Motivated by the promising results in classification, this paper examines the predictive performance of the BES-OOB strategy for regression problems. Our results show that the BES-OOB strategy outperforms Stochastic Gradient Boosting and Bagging when using regression trees as the base learners. Our results also suggest that the advantage of using a diverse model library becomes clear when the model library size is relatively large. We also present encouraging results indicating that the non negative least squares algorithm is a viable approach for pruning an ensemble of ensembles
Kernel density classification and boosting: an L2 sub analysis
Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research
Recommended from our members
Separating Signal From Background Using Ensembles of Rules
Machine learning has emerged as a important tool for separating signal events from associated background in high energy particle physics experiments. This paper describes a new machine learning method based on ensembles of rules. Each rule consists of a conjuction of a small number of simple statements (''cuts'') concerning the values of individual input variables. These rule ensembles produce predictive accuracy comparable to the best methods. However their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on the predictive model. Similarly, the degree of relevance of each of the respective input variables can be assessed. Graphical representations are presented that can be used to ascertain the dependence of the model jointly on the variables used for prediction
Immediate reward reinforcement learning for clustering and topology preserving mappings
We extend a reinforcement learning algorithm which has previously been shown to cluster data. Our extension involves creating an underlying latent space with some pre-defined structure which enables us to create a topology preserving mapping. We investigate different forms of the reward function, all of which are created with the intent of merging local and global information, thus avoiding one of the major difficulties with e.g. K-means which is its convergence to local optima depending on the initial values of its parameters. We also show that the method is quite general and can be used with the recently developed method of stochastic weight reinforcement learning [14]
Cost-sensitive Bayesian network learning using sampling
A significant advance in recent years has been the development of cost-sensitive decision tree learners, recognising that real world classification problems need to take account of costs of misclassification and not just focus on accuracy. The literature contains well over 50 cost-sensitive decision tree induction algorithms, each with varying performance profiles. Obtaining good Bayesian networks can be challenging and hence several algorithms have been proposed for learning their structure and parameters from data. However, most of these algorithms focus on learning Bayesian networks that aim to maximise the accuracy of classifications. Hence an obvious question that arises is whether it is possible to develop cost-sensitive Bayesian networks and whether they would perform better than cost-sensitive decision trees for minimising classification cost? This paper explores this question by developing a new Bayesian network learning algorithm based on changing the data distribution to reflect the costs of misclassification. The proposed method is explored by conducting experiments on over 20 data sets. The results show that this approach produces good results in comparison to more complex cost-sensitive decision tree algorithms
Building Detection from Mobile Imagery Using Informative SIFT Descriptors
Abstract. We propose reliable outdoor object detection on mobile phone imagery from o-the-shelf devices. With the goal to provide both robust object detection and reduction of computational complexity for situated interpretation of urban imagery, we propose to apply the 'Informative Descriptor Approach ' on SIFT features (i-SIFT descriptors). We learn an attentive matching of i-SIFT keypoints, resulting in a signicant im-provement of state-of-the-art SIFT descriptor based keypoint matching. In the o-line learning stage, rstly, standard SIFT responses are eval-uated using an information theoretic quality criterion with respect to object semantics, rejecting features with insucient conditional entropy measure, producing both sparse and discriminative object representa-tions. Secondly, we learn a decision tree from the training data set that maps SIFT descriptors to entropy values. The key advantages of in-formative SIFT (i-SIFT) to standard SIFT encoding are argued from observations on performance complexity, and demonstrated in a typical outdoor mobile vision experiment on the MPG-20 reference database.
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
Penalized regression is an attractive framework for variable selection
problems. Often, variables possess a grouping structure, and the relevant
selection problem is that of selecting groups, not individual variables. The
group lasso has been proposed as a way of extending the ideas of the lasso to
the problem of group selection. Nonconvex penalties such as SCAD and MCP have
been proposed and shown to have several advantages over the lasso; these
penalties may also be extended to the group selection problem, giving rise to
group SCAD and group MCP methods. Here, we describe algorithms for fitting
these models stably and efficiently. In addition, we present simulation results
and real data examples comparing and contrasting the statistical properties of
these methods
Classifiers Based on Two-Layered Learning
Abstract. In this paper we present an exemplary classifier (classifica-tion algorithm) based on two-layered learning. In the first layer of learn-ing a collection of classifiers is induced from a part of original training data set. In the second layer classifiers are induced using patterns ex-tracted from already constructed classifiers on the basis of their perfor-mance on the remaining part of training data. We report results of exper-iments performed on the following data sets, well known from literature: diabetes, heart disease, australian credit (see [5]) and lymphography (see [4]). We compare the standard rough set method used to induce classi-fiers (see [1] for more details), based on minimal consistent decision rules (see [6]), with the classifier based on two-layered learning.
A Study of Machine Learning Techniques for Daily Solar Energy Forecasting using Numerical Weather Models
Proceedings of: 8th International Symposium on Intelligent Distributed Computing (IDC'2014). Madrid, September 3-5, 2014Forecasting solar energy is becoming an important issue in the context of renewable energy sources and Machine Learning Algorithms play an important rule in this field. The prediction of solar energy can be addressed as a time series prediction problem using historical data. Also, solar energy forecasting can be derived from numerical weather prediction models (NWP). Our interest is focused on the latter approach.We focus on the problem of predicting solar energy from NWP computed from GEFS, the Global Ensemble Forecast System, which predicts meteorological variables for points in a grid. In this context, it can be useful to know how prediction accuracy improves depending on the number of grid nodes used as input for the machine learning techniques. However, using the variables from a large number of grid nodes can result in many attributes which might degrade the generalization performance of the learning algorithms. In this paper both issues are studied using data supplied by Kaggle for the State of Oklahoma comparing Support Vector Machines and Gradient Boosted Regression. Also, three different feature selection methods have been tested: Linear Correlation, the ReliefF algorithm and, a new method based on local information analysis.Publicad
- …