2 research outputs found
Posterior Sampling for Large Scale Reinforcement Learning
We propose a practical non-episodic PSRL algorithm that unlike recent
state-of-the-art PSRL algorithms uses a deterministic, model-independent
episode switching schedule. Our algorithm termed deterministic schedule PSRL
(DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove
a Bayesian regret bound under mild assumptions. Our result is more generally
applicable to multiple parameters and continuous state action problems. We
compare our algorithm with state-of-the-art PSRL algorithms on standard
discrete and continuous problems from the literature. Finally, we show how the
assumptions of our algorithm satisfy a sensible parametrization for a large
class of problems in sequential recommendations
Reinforcement Learning for Strategic Recommendations
Strategic recommendations (SR) refer to the problem where an intelligent
agent observes the sequential behaviors and activities of users and decides
when and how to interact with them to optimize some long-term objectives, both
for the user and the business. These systems are in their infancy in the
industry and in need of practical solutions to some fundamental research
challenges. At Adobe research, we have been implementing such systems for
various use-cases, including points of interest recommendations, tutorial
recommendations, next step guidance in multi-media editing software, and ad
recommendation for optimizing lifetime value. There are many research
challenges when building these systems, such as modeling the sequential
behavior of users, deciding when to intervene and offer recommendations without
annoying the user, evaluating policies offline with high confidence, safe
deployment, non-stationarity, building systems from passive data that do not
contain past recommendations, resource constraint optimization in multi-user
systems, scaling to large and dynamic actions spaces, and handling and
incorporating human cognitive biases. In this paper we cover various use-cases
and research challenges we solved to make these systems practical