Search typically relies on keyword queries, but these are often semantically
ambiguous. We propose to overcome this by offering users natural language
questions, based on their keyword queries, to disambiguate their intent. This
keyword-to-question task may be addressed using neural machine translation
techniques. Neural translation models, however, require massive amounts of
training data (keyword-question pairs), which is unavailable for this task. The
main idea of this paper is to generate large amounts of synthetic training data
from a small seed set of hand-labeled keyword-question pairs. Since natural
language questions are available in large quantities, we develop models to
automatically generate the corresponding keyword queries. Further, we introduce
various filtering mechanisms to ensure that synthetic training data is of high
quality. We demonstrate the feasibility of our approach using both automatic
and manual evaluation. This is an extended version of the article published
with the same title in the Proceedings of ICTIR'18.Comment: Extended version of ICTIR'18 full paper, 11 page