Search CORE

99 research outputs found

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

Author: Fokkens Antske
Sommerauer Pia
Publication venue
Publication date: 01/01/2018
Field of study

This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human-elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and com- pares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.Comment: Accepted to the EMNLP workshop "Analyzing and interpreting neural networks for NLP

arXiv.org e-Print Archive

VU Research Portal

Crossref

Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

Author: Beinborn Lisa
Fokkens Antske
Kamp Jonathan
Publication venue
Publication date: 29/09/2022
Field of study

VU Research Portal

Spring Cleaning and Grammar Compression: Two Techniques for Detection of Redundancy in HPSG Grammars

Author: Bender Emily M.
Fokkens Antske
Zhang Yi
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Dealing with Abbreviations in the Slovenian Biographical Lexicon

Author: Daza Angel
Erjavec Tomaž
Fokkens Antske
Publication venue
Publication date: 01/01/2022
Field of study

Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. They can also make the text less readable, especially in reference printed books, where they are extensively used. Abbreviations are especially problematic in low-resource settings, where systems are less robust to begin with. In this paper, we propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text. We apply this method to the case of a Slovenian biographical lexicon and evaluate it on a newly developed gold-standard dataset of 51 Slovenian biographies. Our abbreviation identification method performs significantly better than commonly used ad-hoc solutions, especially at identifying unseen abbreviations. We also propose and present the results of a method for expanding the identified abbreviations in context.Comment: To be presented at The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022

arXiv.org e-Print Archive

VU Research Portal

Dynamic Top-k Estimation Consolidates Disagreement between Feature Attribution Methods

Author: Beinborn Lisa
Fokkens Antske
Kamp Jonathan
Publication venue
Publication date: 09/10/2023
Field of study

Feature attribution scores are used for explaining the prediction of a text classifier to users by highlighting a k number of tokens. In this work, we propose a way to determine the number of optimal k tokens that should be displayed from sequential properties of the attribution scores. Our approach is dynamic across sentences, method-agnostic, and deals with sentence length bias. We compare agreement between multiple methods and humans on an NLI task, using fixed k and dynamic k. We find that perturbation-based methods and Vanilla Gradient exhibit highest agreement on most method--method and method--human agreement metrics with a static k. Their advantage over other methods disappears with dynamic ks which mainly improve Integrated Gradient and GradientXInput. To our knowledge, this is the first evidence that sequential properties of attribution scores are informative for consolidating attribution signals for human interpretation

VU Research Portal

Finding Stories in 1,784,532 Events: Scaling Up Computational Models of Narrative

Author: Fokkens Antske
van Erp Marieke
Vossen Piek
Publication venue: OASIcs - OpenAccess Series in Informatics. 2014 Workshop on Computational Models of Narrative
Publication date: 01/01/2014
Field of study

Information professionals face the challenge of making sense of an ever increasing amount of information. Storylines can provide a useful way to present relevant information because they reveal explanatory relations between events. In this position paper, we present and discuss the four main challenges that make it difficult to get to these stories and our first ideas on how to start resolving them

VU Research Portal

Dagstuhl Research Online Publication Server

Large-scale Cross-lingual Language Resources for Referencing and Framing

Author: Fokkens Antske
Ilievski Filip
Minnema Gosse
Postma Marten
Remijnse Levi
Vossen Piek
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/05/2020
Field of study

In this article, we lay out the basic ideas and principles of the project Framing Situations in the Dutch Language. We provide our first results of data acquisition, together with the first data release. We introduce the notion of cross-lingual referential corpora. These corpora consist of texts that make reference to exactly the same incidents. The referential grounding allows us to analyze the framing of these incidents in different languages and across different texts. During the project, we will use the automatically generated data to study linguistic framing as a phenomenon, build framing resources such as lexicons and corpora. We expect to capture larger variation in framing compared to traditional approaches for building such resources. Our first data release, which contains structured data about a large number of incidents and reference texts, can be found at http://dutchframenet. nl/data-releases/

VU Research Portal

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Large-scale Cross-lingual Language Resources for Referencing and Framing

Author: Fokkens Antske
Ilievski Filip
Minnema Gosse
Postma Marten
Remijnse Levi
Vossen Piek
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/05/2020
Field of study

University of Groningen

Biographical Data in a Digital World 2022 (BD 2022) Workshop

Author: Daza Angel
Fokkens Antske
Hadden Richard
Hyvönen Eero
Koho Mikko
Wandl-Vogt Eveline
Publication venue: The Alliance of Digital Humanities Organizations (ADHO)
Publication date: 01/07/2022
Field of study

Helsingin yliopiston digitaalinen arkisto

A larger-scale evaluation resource of terms and their shift direction for diachronic lexical semantics

Author: Aggelen A.E. (Astrid) van
Fokkens A. (Antske)
Hollink L. (Laura)
Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 30/09/2019
Field of study

Determining how words have changed their meaning is an important topic in Natural Language Processing. However, evaluations of methods to characterise such change have been limited to small, handcrafted resources. We introduce an English evaluation set which is larger, more varied, and more realistic than seen to date, with terms derived from a historical thesaurus. Moreover, the dataset is unique in that it represents change as a shift from the term of interest to a WordNet synset. Using the synset lemmas, we can use this set to evaluate (standard) methods that detect change between word pairs, as well as (adapted) methods that detect the change between a term and a sense overall. We show that performance on the new data set is much lower than earlier reported findings, setting a new standard

CWI's Institutional Repository