Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences
between ranking systems based on historical interaction data, while mitigating
the effect of position bias and item-selection bias. We introduce the novel
Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for
logging data so that the counterfactual estimate has minimal variance. As
minimizing variance leads to faster convergence, LogOpt increases the
data-efficiency of counterfactual estimation. LogOpt turns the counterfactual
approach - which is indifferent to the logging policy - into an online
approach, where the algorithm decides what rankings to display. We prove that,
as an online evaluation method, LogOpt is unbiased w.r.t. position and
item-selection bias, unlike existing interleaving methods. Furthermore, we
perform large-scale experiments by simulating comparisons between thousands of
rankers. Our results show that while interleaving methods make systematic
errors, LogOpt is as efficient as interleaving without being biased.Comment: ICTIR 202