"Clipping" (a.k.a. importance weight truncation) is a widely used
variance-reduction technique for counterfactual off-policy estimators. Like
other variance-reduction techniques, clipping reduces variance at the cost of
increased bias. However, unlike other techniques, the bias introduced by
clipping is always a downward bias (assuming non-negative rewards), yielding a
lower bound on the true expected reward. In this work we propose a simple
extension, called double clipping, which aims to compensate this
downward bias and thus reduce the overall bias, while maintaining the variance
reduction properties of the original estimator.Comment: Presented at CONSEQUENCES '23 workshop at RecSys 2023 conference in
Singapor