Though impressive performance has been achieved in specific visual realms
(e.g. faces, dogs, and places), an omni-vision representation generalizing to
many natural visual domains is highly desirable. But, existing benchmarks are
biased and inefficient to evaluate the omni-vision representation -- these
benchmarks either only include several specific realms, or cover most realms at
the expense of subsuming numerous datasets that have extensive realm
overlapping. In this paper, we propose Omni-Realm Benchmark (OmniBenchmark). It
includes 21 realm-wise datasets with 7,372 concepts and 1,074,346 images.
Without semantic overlapping, these datasets cover most visual realms
comprehensively and meanwhile efficiently. In addition, we propose a new
supervised contrastive learning framework, namely Relational Contrastive
learning (ReCo), for a better omni-vision representation. Beyond pulling two
instances from the same concept closer -- the typical supervised contrastive
learning framework -- ReCo also pulls two instances from the same semantic
realm closer, encoding the semantic relation between concepts, and facilitating
omni-vision representation learning. We benchmark ReCo and other advances in
omni-vision representation studies that are different in architectures (from
CNNs to transformers) and in learning paradigms (from supervised learning to
self-supervised learning) on OmniBenchmark. We illustrate the superior of ReCo
to other supervised contrastive learning methods and reveal multiple practical
observations to facilitate future research.Comment: In ECCV 2022; The project page at
https://zhangyuanhan-ai.github.io/OmniBenchmar