This paper analyzes strategic choice in p-beauty contests. We first show that it is not generally a best reply to guess the expected target value (accounting for the own weight) even in games with n>2 players and that iterated best response sequences are not unique even after perfect/cautious refinement. This implies that standard formulations of ``level-k'' models are neither exactly nor uniquely rationalizable by belief systems based on iterated best response. Second, exact modeling of iterated reasoning weakens the fit considerably and reveals that equilibrium types dominate the populations. We also show that ``levels of reasoning'' cannot be measured regardless of the underlying model. Third, we consider a ``nested logit'' model where players choose their level. It dispenses with belief systems between players and is rationalized by a random utility model. Besides being internally consistent, nested logit equilibrium fits better than three variants of the level-k model in standard data sets.