In this paper, we study both multi-armed and contextual bandit problems in
censored environments. Our goal is to estimate the performance loss due to
censorship in the context of classical algorithms designed for uncensored
environments. Our main contributions include the introduction of a broad class
of censorship models and their analysis in terms of the effective dimension of
the problem -- a natural measure of its underlying statistical complexity and
main driver of the regret bound. In particular, the effective dimension allows
us to maintain the structure of the original problem at first order, while
embedding it in a bigger space, and thus naturally leads to results analogous
to uncensored settings. Our analysis involves a continuous generalization of
the Elliptical Potential Inequality, which we believe is of independent
interest. We also discover an interesting property of decision-making under
censorship: a transient phase during which initial misspecification of
censorship is self-corrected at an extra cost, followed by a stationary phase
that reflects the inherent slowdown of learning governed by the effective
dimension. Our results are useful for applications of sequential
decision-making models where the feedback received depends on strategic
uncertainty (e.g., agents' willingness to follow a recommendation) and/or
random uncertainty (e.g., loss or delay in arrival of information).Comment: 45 pages, 5 figures, NeurIPS 202