We consider the partial observability model for multi-armed bandits,
introduced by Mannor and Shamir. Our main result is a characterization of
regret in the directed observability model in terms of the dominating and
independence numbers of the observability graph. We also show that in the
undirected case, the learner can achieve optimal regret without even accessing
the observability graph before selecting an action. Both results are shown
using variants of the Exp3 algorithm operating on the observability graph in a
time-efficient manner