Large corpora of task-based and open-domain conversational dialogues are
hugely valuable in the field of data-driven dialogue systems. Crowdsourcing
platforms, such as Amazon Mechanical Turk, have been an effective method for
collecting such large amounts of data. However, difficulties arise when
task-based dialogues require expert domain knowledge or rapid access to
domain-relevant information, such as databases for tourism. This will become
even more prevalent as dialogue systems become increasingly ambitious,
expanding into tasks with high levels of complexity that require collaboration
and forward planning, such as in our domain of emergency response. In this
paper, we propose CRWIZ: a framework for collecting real-time Wizard of Oz
dialogues through crowdsourcing for collaborative, complex tasks. This
framework uses semi-guided dialogue to avoid interactions that breach
procedures and processes only known to experts, while enabling the capture of a
wide variety of interactions. The framework is available at
https://github.com/JChiyah/crwizComment: 10 pages, 5 figures. To Appear in LREC 202