We propose a general method for automated word puzzle generation. Contrary to
previous approaches in this novel field, the presented method does not rely on
highly structured datasets obtained with serious human annotation effort: it
only needs an unstructured and unannotated corpus (i.e., document collection)
as input. The method builds upon two additional pillars: (i) a topic model,
which induces a topic dictionary from the input corpus (examples include e.g.,
latent semantic analysis, group-structured dictionaries or latent Dirichlet
allocation), and (ii) a semantic similarity measure of word pairs. Our method
can (i) generate automatically a large number of proper word puzzles of
different types, including the odd one out, choose the related word and
separate the topics puzzle. (ii) It can easily create domain-specific puzzles
by replacing the corpus component. (iii) It is also capable of automatically
generating puzzles with parameterizable levels of difficulty suitable for,
e.g., beginners or intermediate learners.Comment: 4 page