Search CORE

8 research outputs found

Constructive Type-Logical Supertagging with Self-Attention Networks

Author: Deoskar Tejaswini
Kogkalidis Konstantinos
Moortgat Michael
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 24/05/2019
Field of study

We propose a novel application of self-attention networks towards grammar induction. We present an attention-based supertagger for a refined type-logical grammar, trained on constructing types inductively. In addition to achieving a high overall type accuracy, our model is able to learn the syntax of the grammar's type system along with its denotational semantics. This lifts the closed world assumption commonly made by lexicalized grammar supertaggers, greatly enhancing its generalization potential. This is evidenced both by its adequate accuracy over sparse word types and its ability to correctly construct complex types never seen during training, which, to the best of our knowledge, was as of yet unaccomplished.Comment: REPL4NLP 4, ACL 201

arXiv.org e-Print Archive

Utrecht University Repository

A Benchmark of Rule-Based and Neural Coreference Resolution in Dutch Novels and News

Author: Poot Corbèn
van Cranenburgh Andreas
Publication venue
Publication date: 01/01/2020
Field of study

We evaluate a rule-based (Lee et al., 2013) and neural (Lee et al., 2018) coreference system on Dutch datasets of two domains: literary novels and news/Wikipedia text. The results provide insight into the relative strengths of data-driven and knowledge-driven systems, as well as the influence of domain, document length, and annotation schemes. The neural system performs best on news/Wikipedia text, while the rule-based system performs best on literature. The neural system shows weaknesses with limited training data and long documents, while the rule-based system is affected by annotation differences. The code and models used in this paper are available at https://github.com/andreasvc/crac2020Comment: Accepted for CRAC 2020 @ COLIN

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

Dissertations of the University of Groningen

DuELME: a Dutch electronic lexicon of multiword expressions

Author: C. Fellbaum
M. T. Rosetta
Nicole Grégoire
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Syntactic Annotation of Large Corpora in STEVIN

Author: Devillers L.
Schuurman I.
Vandeghinste 28213
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2006
Field of study

Syntactic Annotation of Large Corpora in STEVIN

Author: Schuurman I.
Vandeghinste 28213
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2006
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Syntactic Annotation of Large Corpora in STEVIN

Author: Ru Groningen
Publication venue
Publication date
Field of study

The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the Dutch/Flemish STEVIN programme. For part of this corpus, manually corrected syntactic annotations will be provided. The paper presents the background of the syntactic annotation efforts, the Alpino parser which is used as an important tool for constructing the syntactic annotations, as well as a number of other annotation tools and guidelines. For the full STEVIN corpus, automatically derived syntactic annotations will be provided in a later phase of the programme. A number of arguments is provided suggesting that such a resource can be very useful for applications in information extraction, ontology building, lexical acquisition, machine translation and corpus linguistics. 1. Background The Dutch Language Corpus Initiative (D-Coi) is one of the projects funded within the current STEVIN programme. 1 The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the programme. In D-Coi, a 50-million-word pilot corpus is being compiled, parts of which will be enriched with (verified) linguistic annotations. In particular, syntactic annotatio

CiteSeerX