Which equilibria will arise in signaling games depends on how the receiver
interprets deviations from the path of play. We develop a micro-foundation for
these off-path beliefs, and an associated equilibrium refinement, in a model
where equilibrium arises through non-equilibrium learning by populations of
patient and long-lived senders and receivers. In our model, young senders are
uncertain about the prevailing distribution of play, so they rationally send
out-of-equilibrium signals as experiments to learn about the behavior of the
population of receivers. Differences in the payoff functions of the types of
senders generate different incentives for these experiments. Using the Gittins
index (Gittins, 1979), we characterize which sender types use each signal more
often, leading to a constraint on the receiver's off-path beliefs based on
"type compatibility" and hence a learning-based equilibrium selection