Comparative reasoning is a process of comparing objects, concepts, or
entities to draw conclusions, which constitutes a fundamental cognitive
ability. In this paper, we propose a novel framework to pre-train language
models for enhancing their abilities of comparative reasoning over texts. While
there have been approaches for NLP tasks that require comparative reasoning,
they suffer from costly manual data labeling and limited generalizability to
different tasks. Our approach introduces a novel method of collecting scalable
data for text-based entity comparison, which leverages both structured and
unstructured data. Moreover, we present a framework of pre-training language
models via three novel objectives on comparative reasoning. Evaluation on
downstream tasks including comparative question answering, question generation,
and summarization shows that our pre-training framework significantly improves
the comparative reasoning abilities of language models, especially under
low-resource conditions. This work also releases the first integrated benchmark
for comparative reasoning.Comment: EMNLP 2023 - Camera Ready. Typos fixe