A/B testing, or online experiment is a standard business strategy to compare
a new product with an old one in pharmaceutical, technological, and traditional
industries. Major challenges arise in online experiments where there is only
one unit that receives a sequence of treatments over time. In those
experiments, the treatment at a given time impacts current outcome as well as
future outcomes. The aim of this paper is to introduce a reinforcement learning
framework for carrying A/B testing, while characterizing the long-term
treatment effects. Our proposed testing procedure allows for sequential
monitoring and online updating, so it is generally applicable to a variety of
treatment designs in different industries. In addition, we systematically
investigate the theoretical properties (e.g., asymptotic distribution and
power) of our testing procedure. Finally, we apply our framework to both
synthetic datasets and a real-world data example obtained from a ride-sharing
company to illustrate its usefulness