Uni- and Multimodal and Structured Representations for Modeling Frame Semantics

Abstract

Language is the most complex kind of shared knowledge evolved by humankind and it is the foundation of communication between humans. At the same time, one of the most challenging problems in Artificial Intelligence is to grasp the meaning conveyed by language. Humans use language to communicate knowledge and information about the world and to exchange their thoughts. In order to understand the meaning of words in a sentence, single words are interpreted in the context of the sentence and of the situation together with a large background of commonsense knowledge and experience in the world. The research field of Natural Language Processing aims at automatically understanding language as humans do naturally. In this thesis, the overall challenge of understanding meaning in language by capturing world knowledge is examined from the two branches of (a) knowledge about situations and actions as expressed in texts and (b) structured relational knowledge as stored in knowledge bases. Both branches can be studied with different kinds of vector representations, so-called embeddings, for operationalizing different aspects of knowledge: textual, structured, and visual or multimodal embeddings. This poses the challenge of determining the suitability of different embeddings for automatic language understanding with respect to the two branches. To approach these challenges, we choose to closely rely upon the lexical-semantic knowledge base FrameNet. It addresses both branches of capturing world knowledge whilst taking into account the linguistic theory of frame semantics which orients on human language understanding. FrameNet provides frames, which are categories for knowledge of meaning, and frame-to-frame relations, which are structured meta-knowledge of interactions between frames. These frames and relations are central to the tasks of Frame Identification and Frame-to-Frame Relation Prediction. Concerning branch (a), the task of Frame Identification was introduced to advance the understanding of context knowledge about situations, actions and participants. The task is to label predicates with frames in order to identify the meaning of the predicate in the context of the sentence. We use textual embeddings to model the semantics of words in the sentential context and develop a state-of-the-art system for Frame Identification. Our Frame Identification system can be used to automatically annotate frames on English or German texts. Furthermore, in our multimodal approach to Frame Identification, we combine textual embeddings for words with visual embeddings for entities depicted on images. We find that visual information is especially useful in difficult settings with rare frames. To further advance the performance of the multimodal approach, we suggest to develop embeddings for verbs specifically that incorporate multimodal information. Concerning branch (b), we introduce the task of Frame-to-Frame Relation Prediction to advance the understanding of relational knowledge of interactions between frames. The task is to label connections between frames with relations in order to complete the meta-knowledge stored in FrameNet. We train textual and structured embeddings for frames and explore the limitations of textual frame embeddings with respect to recovering relations between frames. Moreover, we contrast textual frame embeddings versus structured frame embeddings and develop the first system for Frame-to-Frame Relation Prediction. We find that textual and structured frame embeddings differ with respect to predicting relations; thus when applied as features in the context of further tasks, they can provide different kinds of frame knowledge. Our structured prediction system can be used to generate recommendations for annotations with relations. To further advance the performance of Frame-to-Frame Relation Prediction and also of the induction of new frames and relations, we suggest to develop approaches that incorporate visual information. The two kinds of frame knowledge from both branches, our Frame Identification system and our pre-trained frame embeddings, are combined in an extrinsic evaluation in the context of higher-level applications. Across these applications, we see a trend that frame knowledge is particularly beneficial in ambiguous and short sentences. Taken together, in this thesis, we approach semantic language understanding from the two branches of knowledge about situations and actions and structured relational knowledge and investigate different embeddings for textual, structured and multimodal language understanding

    Similar works