Thousands of complex natural language questions are submitted to community
question answering websites on a daily basis, rendering them as one of the most
important information sources these days. However, oftentimes submitted
questions are unclear and cannot be answered without further clarification
questions by expert community members. This study is the first to investigate
the complex task of classifying a question as clear or unclear, i.e., if it
requires further clarification. We construct a novel dataset and propose a
classification approach that is based on the notion of similar questions. This
approach is compared to state-of-the-art text classification baselines. Our
main finding is that the similar questions approach is a viable alternative
that can be used as a stepping stone towards the development of supportive user
interfaces for question formulation.Comment: Proceedings of the 41th European Conference on Information Retrieval
(ECIR '19), 201