Search CORE

3 research outputs found

A cost-sensitive decision tree learning algorithm based on a multi-armed bandit framework

Author: Auer
Esmeir
Gabillon
Murthy
Robbins
Sunil Vadera
Susan Lomax
Tan
Turney
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/07/2017
Field of study

This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research

University of Salford Institutional Repository

Crossref

Learning from interaction: models and applications

Author: Glowacka D
Publication venue: UCL (University College London)
Publication date: 28/10/2012
Field of study

A large proportion of Machine Learning (ML) research focuses on designing algorithms that require minimal input from the human. However, ML algo- rithms are now widely used in various areas of engineering to design and build systems that interact with the human user and thus need to “learn” from this interaction. In this work, we concentrate on algorithms that learn from user interaction. A significant part of the dissertation is devoted to learning in the bandit setting. We propose a general framework for handling dependencies across arms, based on the new assumption that the mean-reward function is drawn from a Gaussian Process. Additionally, we propose an alternative method for arm selection using Thompson sampling and we apply the new algorithms to a grammar learning problem. In the remainder of the dissertation, we consider content-based image re- trieval in the case when the user is unable to specify the required content through tags or other image properties and so the system must extract infor- mation from the user through limited feedback. We present a novel Bayesian approach that uses latent random variables to model the systems imperfect knowledge about the users expected response to the images. An impor- tant aspect of the algorithm is the incorporation of an explicit exploration- exploitation strategy in the image sampling process. A second aspect of our algorithm is the way in which its knowledge of the target image is updated given user feedback. We considered a few algorithms to do so: variational Bayes, Gibbs sampling and a simple uniform update. We show in experi- ments that the simple uniform update performs best. The reason is because, unlike the uniform update, both variational Bayes and Gibbs sampling tend to focus on a small set of images aggressively

UCL Discovery