450 research outputs found
Validation of language agnostic models for discourse marker detection
Using language models to detect or predict the
presence of language phenomena in the text has
become a mainstream research topic. With the
rise of generative models, experiments using
deep learning and transformer models trigger
intense interest. Aspects like precision of predictions,
portability to other languages or phenomena,
scale have been central to the research
community. Discourse markers, as language
phenomena, perform important functions, such
as signposting, signalling, and rephrasing, by
facilitating discourse organization. Our paper
is about discourse markers detection, a complex
task as it pertains to a language phenomenon
manifested by expressions that can occur as
content words in some contexts and as discourse
markers in others. We have adopted
language agnostic model trained in English to
predict the discourse marker presence in texts
in 8 other unseen by the model languages with
the goal to evaluate how well the model performs
in different structure and lexical properties
languages. We report on the process of
evaluation and validation of the model's performance
across European Portuguese, Hebrew,
German, Polish, Romanian, Bulgarian, Macedonian,
and Lithuanian and about the results
of this validation. This research is a key step
towards multilingual language processing
- …