Comparative knowledge (e.g., steel is stronger and heavier than styrofoam) is
an essential component of our world knowledge, yet understudied in prior
literature. In this paper, we study the task of comparative knowledge
acquisition, motivated by the dramatic improvements in the capabilities of
extreme-scale language models like GPT-4, which have fueled efforts towards
harvesting their knowledge into knowledge bases. While acquisition of such
comparative knowledge is much easier from models like GPT-4, compared to their
considerably smaller and weaker counterparts such as GPT-2, not even the most
powerful models are exempt from making errors. We thus ask: to what extent are
models at different scales able to generate valid and diverse comparative
knowledge?
We introduce NeuroComparatives, a novel framework for comparative knowledge
distillation overgenerated from language models such as GPT-variants and Llama,
followed by stringent filtering of the generated knowledge. Our framework
acquires comparative knowledge between everyday objects, producing a corpus of
up to 8.8M comparisons over 1.74M entity pairs - 10X larger and 30% more
diverse than existing resources. Moreover, human evaluations show that
NeuroComparatives outperform existing resources (up to 32% absolute
improvement). We also demonstrate the utility of our distilled
NeuroComparatives on three downstream tasks. Our results show that
neuro-symbolic manipulation of smaller models offer complementary benefits to
the currently dominant practice of prompting extreme-scale language models for
knowledge distillation