Hanabi is a cooperative game that brings the problem of modeling other
players to the forefront. In this game, coordinated groups of players can
leverage pre-established conventions to great effect, but playing in an ad-hoc
setting requires agents to adapt to its partner's strategies with no previous
coordination. Evaluating an agent in this setting requires a diverse population
of potential partners, but so far, the behavioral diversity of agents has not
been considered in a systematic way. This paper proposes Quality Diversity
algorithms as a promising class of algorithms to generate diverse populations
for this purpose, and generates a population of diverse Hanabi agents using
MAP-Elites. We also postulate that agents can benefit from a diverse population
during training and implement a simple "meta-strategy" for adapting to an
agent's perceived behavioral niche. We show this meta-strategy can work better
than generalist strategies even outside the population it was trained with if
its partner's behavioral niche can be correctly inferred, but in practice a
partner's behavior depends and interferes with the meta-agent's own behavior,
suggesting an avenue for future research in characterizing another agent's
behavior during gameplay.Comment: arXiv admin note: text overlap with arXiv:1907.0384