Search CORE

40 research outputs found

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

Author: Peter Spyns Jan Odijk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

Directory of Open Access Books (DOAB)

BERTje:A Dutch BERT Model

Author: Bisazza Arianna
Caselli Tommaso
de Vries Wietse
Nissim Malvina
Noord van, Gertjan
van Cranenburgh Andreas
Publication venue
Publication date: 19/12/2019
Field of study

University of Groningen

Generating, Refining and Using Sentiment Lexicons

Author: Ackermans P.
de Rijke M.
Geleijnse G.
Jijkoun V.
Laan F.
Weerkamp W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/11/2012
Field of study

Crossref

Springer - Publisher Connector

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Querying large treebanks : benchmarking GrETEL indexing

Author: Augustinus Liesbeth
Vandeghinste Vincent
Vanroy Bram
Publication venue
Publication date: 01/01/2017
Field of study

The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank

Ghent University Academic Bibliography

BERTje:A Dutch BERT Model

Author: Bisazza Arianna
Caselli Tommaso
de Vries Wietse
Nissim Malvina
Noord van, Gertjan
van Cranenburgh Andreas
Publication venue
Publication date: 19/12/2019
Field of study

ARTS repository - University of Groningen

BERTje:A Dutch BERT Model

Author: Bisazza Arianna
Caselli Tommaso
de Vries Wietse
Nissim Malvina
Noord van, Gertjan
van Cranenburgh Andreas
Publication venue
Publication date: 19/12/2019
Field of study

The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks. Using the same architecture and parameters, we developed and evaluated a monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT model, which includes Dutch but is only based on Wikipedia text, BERTje is based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently outperforms the equally-sized multilingual BERT model on downstream NLP tasks (part-of-speech tagging, named-entity recognition, semantic role labeling, and sentiment analysis). Our pre-trained Dutch BERT model is made available at https://github.com/wietsedv/bertje

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

CLARIN’s Support for Research into the Acquisition of Lexical Properties

Author: Fišer Darja
Odijk Jan
Witt Andreas
Publication venue
Publication date: 12/10/2022
Field of study

Odijk (2011) sketched a research question on the acquisition of lexical properties of words, and illustrated it with some concrete examples, in particular with respect to the lexical properties of the Dutch synonyms heel, erg, and zeer (all meaning ‘very’). This work also indicated what the CLARIN infrastructure should offer to make it possible to address this research question. In this contribution I sketch to what extent the CLARIN infrastructure has achieved these requirements and desiderata. The resulting picture is mixed: (1) some have been implemented; (2) some have not been implemented and are still highly desirable; (3) some have not been implemented but turned out to be not so urgent; (4) new requirements and desiderata have arisen in the last 10 years, only some of which have been implemented. In this way, I evaluate the development of the CLARIN infrastructure (mainly its Netherlands part) over the past 10 years, and sketch the requirements and desiderata for the CLARIN infrastructure to address this research question for the next 10 years

Utrecht University Repository

Finding Dutch multiword expressions

Author: Baarda Tijmen C.
Bonfil Ben
Kroon Martin
Odijk Jan
Spoel Sheean
Publication venue
Publication date: 16/10/2023
Field of study

We present MWE-Finder, which enables a user to search for occurrences of multiword expressions (MWEs) in large Dutch text corpora. Components of many MWEs in Dutch can occur in multiple forms, need not be adjacent, and can occur in multiple orders (such MWEs are called flexible). Searching for occurrences of such flexible MWEs is difficult and cannot be done reliably with most search applications. What is needed is a search engine that takes into account the grammatical configuration of the MWE. MWE-Finder is therefore embedded in GrETEL, a treebank search application for Dutch. A user can enter an example of a MWE in a specific canonical form, after which the system searches for sentences in which the MWE occurs, using queries generated automatically from the canonical form. The MWE can also be selected from a list of more than 11k canonical forms for Dutch MWEs that MWE-Finder offers. We will show that MWE-Finder also offers facilities to find examples with unexpected modifiers or determiners on components of the MW

Utrecht University Repository