Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks

Abstract

Abstract is not available.

    Similar works