Relations such as "is influenced by", "is known for" or "is a competitor of"
are inherently graded: we can rank entity pairs based on how well they satisfy
these relations, but it is hard to draw a line between those pairs that satisfy
them and those that do not. Such graded relations play a central role in many
applications, yet they are typically not covered by existing Knowledge Graphs.
In this paper, we consider the possibility of using Large Language Models
(LLMs) to fill this gap. To this end, we introduce a new benchmark, in which
entity pairs have to be ranked according to how much they satisfy a given
graded relation. The task is formulated as a few-shot ranking problem, where
models only have access to a description of the relation and five prototypical
instances. We use the proposed benchmark to evaluate state-of-the-art relation
embedding strategies as well as several recent LLMs, covering both publicly
available LLMs and closed models such as GPT-4. Overall, we find a strong
correlation between model size and performance, with smaller Language Models
struggling to outperform a naive baseline. The results of the largest Flan-T5
and OPT models are remarkably strong, although a clear gap with human
performance remains