68,610 research outputs found
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Bayesian model-based reinforcement learning is a formally elegant approach to
learning optimal behaviour under model uncertainty, trading off exploration and
exploitation in an ideal way. Unfortunately, finding the resulting
Bayes-optimal policies is notoriously taxing, since the search space becomes
enormous. In this paper we introduce a tractable, sample-based method for
approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our
approach outperformed prior Bayesian model-based RL algorithms by a significant
margin on several well-known benchmark problems -- because it avoids expensive
applications of Bayes rule within the search tree by lazily sampling models
from the current beliefs. We illustrate the advantages of our approach by
showing it working in an infinite state space domain which is qualitatively out
of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in
Neural Information Processing Systems (NIPS) 201
Computational Geometry Column 42
A compendium of thirty previously published open problems in computational
geometry is presented.Comment: 7 pages; 72 reference
Forecasting Popularity of Videos using Social Media
This paper presents a systematic online prediction method (Social-Forecast)
that is capable to accurately forecast the popularity of videos promoted by
social media. Social-Forecast explicitly considers the dynamically changing and
evolving propagation patterns of videos in social media when making popularity
forecasts, thereby being situation and context aware. Social-Forecast aims to
maximize the forecast reward, which is defined as a tradeoff between the
popularity prediction accuracy and the timeliness with which a prediction is
issued. The forecasting is performed online and requires no training phase or a
priori knowledge. We analytically bound the prediction performance loss of
Social-Forecast as compared to that obtained by an omniscient oracle and prove
that the bound is sublinear in the number of video arrivals, thereby
guaranteeing its short-term performance as well as its asymptotic convergence
to the optimal performance. In addition, we conduct extensive experiments using
real-world data traces collected from the videos shared in RenRen, one of the
largest online social networks in China. These experiments show that our
proposed method outperforms existing view-based approaches for popularity
prediction (which are not context-aware) by more than 30% in terms of
prediction rewards
- …