In conversational question answering (CQA), the task of question
rewriting~(QR) in context aims to rewrite a context-dependent question into an
equivalent self-contained question that gives the same answer. In this paper,
we are interested in the robustness of a QR system to questions varying in
rewriting hardness or difficulty. Since there is a lack of questions classified
based on their rewriting hardness, we first propose a heuristic method to
automatically classify questions into subsets of varying hardness, by measuring
the discrepancy between a question and its rewrite. To find out what makes
questions hard or easy for rewriting, we then conduct a human evaluation to
annotate the rewriting hardness of questions. Finally, to enhance the
robustness of QR systems to questions of varying hardness, we propose a novel
learning framework for QR that first trains a QR model independently on each
subset of questions of a certain level of hardness, then combines these QR
models as one joint model for inference. Experimental results on two datasets
show that our framework improves the overall performance compared to the
baselines.Comment: ACL'22, main, long pape