In this paper we examine how people assign, interpret, negotiate and repair the frame of reference (FoR) in online text-based dialogues discussing spatial scenes in English and Swedish. We describe our corpus and data collection which involves a coordination experiment in which dyadic dialogue participants have to identify differences in their picture of a visual scene. As their perspectives of the scene are different, they must coordinate their FoRs in order to complete the task. Results show that participants do not align on a global FoR, but tend to align locally, for sub-portions (or particular conversational games) in the dialogue. This has implications for how dialogue systems should approach problems of FoR assignment – and what strategies for clarification they should implement