With the recent advancements of technology in facilitating real-time
monitoring and data collection, "just-in-time" interventions can be delivered
via mobile devices to achieve both real-time and long-term management and
control. Reinforcement learning formalizes such mobile interventions as a
sequence of decision rules and assigns treatment arms based on the user's
status at each decision point. In practice, real applications concern a large
number of decision points beyond the time horizon of the currently collected
data. This usually refers to reinforcement learning in the infinite horizon
setting, which becomes much more challenging. This article provides a selective
overview of some statistical methodologies on this topic. We discuss their
modeling framework, generalizability, and interpretability and provide some use
case examples. Some future research directions are discussed in the end