601 research outputs found
Learning to Order Things
There are many applications in which it is desirable to order rather than
classify instances. Here we consider the problem of learning how to order
instances given feedback in the form of preference judgments, i.e., statements
to the effect that one instance should be ranked ahead of another. We outline a
two-stage approach in which one first learns by conventional means a binary
preference function indicating whether it is advisable to rank one instance
before another. Here we consider an on-line algorithm for learning preference
functions that is based on Freund and Schapire's 'Hedge' algorithm. In the
second stage, new instances are ordered so as to maximize agreement with the
learned preference function. We show that the problem of finding the ordering
that agrees best with a learned preference function is NP-complete.
Nevertheless, we describe simple greedy algorithms that are guaranteed to find
a good approximation. Finally, we show how metasearch can be formulated as an
ordering problem, and present experimental results on learning a combination of
'search experts', each of which is a domain-specific query expansion strategy
for a web search engine
Learning Multi-label Alternating Decision Trees from Texts and Data
International audienceMulti-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singer?s Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data. Keywords boosting - alternating decision trees - text mining - multi-label problem
A demand-driven approach for a multi-agent system in Supply Chain Management
This paper presents the architecture of a multi-agent decision support system for Supply Chain Management (SCM) which has been designed to compete in the TAC SCM game. The behaviour of the system is demand-driven and the agents plan, predict, and react dynamically to changes in the market. The main strength of the system lies in the ability of the Demand agent to predict customer winning bid prices - the highest prices the agent can offer customers and still obtain their orders. This paper investigates the effect of the ability to predict customer order prices on the overall performance of the system. Four strategies are proposed and compared for predicting such prices. The experimental results reveal which strategies are better and show that there is a correlation between the accuracy of the models' predictions and the overall system performance: the more accurate the prediction of customer order prices, the higher the profit. © 2010 Springer-Verlag Berlin Heidelberg
PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers
The aim of this paper is to generalize the PAC-Bayesian theorems proved by
Catoni in the classification setting to more general problems of statistical
inference. We show how to control the deviations of the risk of randomized
estimators. A particular attention is paid to randomized estimators drawn in a
small neighborhood of classical estimators, whose study leads to control the
risk of the latter. These results allow to bound the risk of very general
estimation procedures, as well as to perform model selection
A multivariate approach to heavy flavour tagging with cascade training
This paper compares the performance of artificial neural networks and boosted
decision trees, with and without cascade training, for tagging b-jets in a
collider experiment. It is shown, using a Monte Carlo simulation of events, that for a b-tagging efficiency of 50%, the light jet
rejection power given by boosted decision trees without cascade training is
about 55% higher than that given by artificial neural networks. The cascade
training technique can improve the performance of boosted decision trees and
artificial neural networks at this b-tagging efficiency level by about 35% and
80% respectively. We conclude that the cascade trained boosted decision trees
method is the most promising technique for tagging heavy flavours at collider
experiments.Comment: 14 pages, 12 figures, revised versio
Preceding rule induction with instance reduction methods
A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy
A survey of cost-sensitive decision tree induction algorithms
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field
- …