1,035 research outputs found
The merits of Universal Language Model Fine-tuning for Small Datasets -- a case with Dutch book reviews
We evaluated the effectiveness of using language models, that were
pre-trained in one domain, as the basis for a classification model in another
domain: Dutch book reviews. Pre-trained language models have opened up new
possibilities for classification tasks with limited labelled data, because
representation can be learned in an unsupervised fashion. In our experiments we
have studied the effects of training set size (100-1600 items) on the
prediction accuracy of a ULMFiT classifier, based on a language models that we
pre-trained on the Dutch Wikipedia. We also compared ULMFiT to Support Vector
Machines, which is traditionally considered suitable for small collections. We
found that ULMFiT outperforms SVM for all training set sizes and that
satisfactory results (~90%) can be achieved using training sets that can be
manually annotated within a few hours. We deliver both our new benchmark
collection of Dutch book reviews for sentiment classification as well as the
pre-trained Dutch language model to the community.Comment: 5 pages, 2 figure
BERTje:A Dutch BERT Model
The transformer-based pre-trained language model BERT has helped to improve
state-of-the-art performance on many natural language processing (NLP) tasks.
Using the same architecture and parameters, we developed and evaluated a
monolingual Dutch BERT model called BERTje. Compared to the multilingual BERT
model, which includes Dutch but is only based on Wikipedia text, BERTje is
based on a large and diverse dataset of 2.4 billion tokens. BERTje consistently
outperforms the equally-sized multilingual BERT model on downstream NLP tasks
(part-of-speech tagging, named-entity recognition, semantic role labeling, and
sentiment analysis). Our pre-trained Dutch BERT model is made available at
https://github.com/wietsedv/bertje
DUMB: A Benchmark for Smart Evaluation of Dutch Models
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a
diverse set of datasets for low-, medium- and high-resource tasks. The total
set of nine tasks includes four tasks that were previously not available in
Dutch. Instead of relying on a mean score across tasks, we propose Relative
Error Reduction (RER), which compares the DUMB performance of language models
to a strong baseline which can be referred to in the future even when assessing
different sets of language models. Through a comparison of 14 pre-trained
language models (mono- and multi-lingual, of varying sizes), we assess the
internal consistency of the benchmark tasks, as well as the factors that likely
enable high performance. Our results indicate that current Dutch monolingual
models under-perform and suggest training larger Dutch models with other
architectures and pre-training objectives. At present, the highest performance
is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In
addition to highlighting best strategies for training larger Dutch models, DUMB
will foster further research on Dutch. A public leaderboard is available at
https://dumbench.nl.Comment: EMNLP 2023 camera-read
- …