19 research outputs found

    Towards a Dutch FrameNet lexicon and parser using the data-to-text method

    Get PDF
    Our presentation introduces the Dutch FrameNet project, whose major outcomes will be a FrameNet-based lexicon and semantic parser for Dutch. This project implements the ‘data-to-text’ method (Vossen et al., LREC 2018), which involves collecting structured data about specific types of real-world events, and then linking this to texts referring to these events. By contrast, earlier FrameNet projects started from text corpora without assumptions about the events they describe. As a consequence, these projects cover a wide variety of events and situations (‘frames’), but have a limited number of annotated examples for every frame. By starting from structured domains, we avoid this sparsity problem, facilitating both machine learning and qualitative analyses on texts in the domains we annotate. Moreover, the data-to-text approach allows us to study the three-way relationship between texts, structured data, and frames, highlighting how real-world events are ‘framed’ in texts. We will discuss the implications of using the data-to-text method for the design and theoretical framework of the Dutch FrameNet and for automatic parsing. First of all, a major departure from traditional frame semantics is that we can use structured data to enrich and inform our frame analyses. For example, certain frames have a strong conceptual link to specific events (e.g., a text cannot describe a murder event without evoking the Killing frame), but texts describing these events may evoke these frames in an implicit way (e.g., a murder described without explicitly using words like ‘kill’), which would lead these events to be missed by traditional FrameNet annotations. Moreover, we will investigate how texts refer to the structured data and how to model this in a useful way for annotators. We theorize that variation in descriptions of the real world is driven by pragmatic requirements (e.g., Gricean maxims; Weigand, 1998) and shared event knowledge. For instance, the sentence ‘Feyenoord hit the goal twice’ implies that Feyenoord scored two points, but this conclusion requires knowledge of Feyenoord and what football matches are like. We will present both an analysis of the influence of world knowledge and pragmatic factors on variation in lexical reference, and ways to model this variation in order to annotate references within and between texts concerning the same event. Automatic frame semantic parsing will adopt a multilingual approach: the data-to-text approach makes it relatively easy to gather a corpus of texts in different languages describing the same events. We aim to use techniques such as cross-lingual annotation projection (Evang & Bos, COLING 2016) to adapt existing parsers and resources developed for English to Dutch, our primary target language, but also to Italian, which will help us make FrameNet and semantic parsers based on it more language-independent. Our parsers will be integrated into the Parallel Meaning Bank project (Abzianidze et al., EACL 2017)

    A Crowdsourced Frame Disambiguation Corpus with Ambiguity

    Full text link
    We present a resource for the task of FrameNet semantic frame disambiguation of over 5,000 word-sentence pairs from the Wikipedia corpus. The annotations were collected using a novel crowdsourcing approach with multiple workers per sentence to capture inter-annotator disagreement. In contrast to the typical approach of attributing the best single frame to each word, we provide a list of frames with disagreement-based scores that express the confidence with which each frame applies to the word. This is based on the idea that inter-annotator disagreement is at least partly caused by ambiguity that is inherent to the text and frames. We have found many examples where the semantics of individual frames overlap sufficiently to make them acceptable alternatives for interpreting a sentence. We have argued that ignoring this ambiguity creates an overly arbitrary target for training and evaluating natural language processing systems - if humans cannot agree, why would we expect the correct answer from a machine to be any different? To process this data we also utilized an expanded lemma-set provided by the Framester system, which merges FN with WordNet to enhance coverage. Our dataset includes annotations of 1,000 sentence-word pairs whose lemmas are not part of FN. Finally we present metrics for evaluating frame disambiguation systems that account for ambiguity.Comment: Accepted to NAACL-HLT201

    Backpropagating through Structured Argmax using a SPIGOT

    Full text link
    We introduce the structured projection of intermediate gradients optimization technique (SPIGOT), a new method for backpropagating through neural networks that include hard-decision structured predictions (e.g., parsing) in intermediate layers. SPIGOT requires no marginal inference, unlike structured attention networks (Kim et al., 2017) and some reinforcement learning-inspired solutions (Yogatama et al., 2017). Like so-called straight-through estimators (Hinton, 2012), SPIGOT defines gradient-like quantities associated with intermediate nondifferentiable operations, allowing backpropagation before and after them; SPIGOT's proxy aims to ensure that, after a parameter update, the intermediate structure will remain well-formed. We experiment on two structured NLP pipelines: syntactic-then-semantic dependency parsing, and semantic parsing followed by sentiment classification. We show that training with SPIGOT leads to a larger improvement on the downstream task than a modularly-trained pipeline, the straight-through estimator, and structured attention, reaching a new state of the art on semantic dependency parsing.Comment: ACL 201
    corecore