Experimental study of multimodal representations for Frame Identification - How to find the right multimodal representations for this task?

Abstract

Frame Identification (FrameId) is the first step in FrameNet Semantic Role Labeling where the correct frame is assigned to the predicate of a sentence. An automatic FrameId system takes the sentence and the predicate as input and predicts the correct frame. Current state-of-the-art FrameId systems are based on pretrained distributed word representations. For a wide range of tasks multimodal approaches are reported to be superior to unimodal approaches when textual embeddings are enriched with information from other modalities, for instance images. Regarding the task of FrameId, to the best of our knowledge, multimodal approaches have not yet been investigated and we think it deserves investigation due to the success of pretrained multimodal representations as input representations for other tasks. We want to find out whether representations that are grounded in images can help to improve the performance of our FrameId system. We report about our preliminary investigations with pretrained multimodal embeddings for FrameId

    Similar works

    Full text

    thumbnail-image

    Available Versions