Evaluation of performance: multi-armed bandit vs. contextual bandit
- Publication date
- Publisher
Abstract
Master of ScienceDepartment of Computer ScienceWilliam H. HsuThis work compares two methods, the multi-armed bandit (MAB) and contextual multi-armed bandit (CMAB), for action recommendation in a sequential decision making domain. It empirically evaluates their effectiveness on a customer relationship management task. The goal of this project is to experiment using [epsilon]-greedy and random selection strategies to characterize the exploration vs. exploitation tradeoff , which manifests when trying to increase or maximize profit while gaining new information regarding the process. The first method under observation, the multi-armed bandit (MAB), is simpler to compute and scales better to larger amounts of data; it has a wide range of applicability, including website optimization, clinical trials, adaptive routing, and stock trading. The contextual multi-armed bandit (CMAB) is an advanced version of the multi-armed bandit which takes into consideration the user’s past usage patterns, especially historical features of the user’s search history; its training data incorporates this context, resulting in a model that is more accurate but also requires a lot of user data which incurs privacy liabilities, an adverse property. This study measures the difference in outcome if the MAB or CMAB have access to user data and assesses, for a real-world application domain, whether this trade-off is significant and worthwhile in the bigger prospective