Crowdsourcing Piedmontese to Test LLMs on Non-Standard Orthography

Vico, Gianluca; Libovický, Jindřich

Search results>Research output from LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

oai:lindat.mff.cuni.cz:11372/LRT-6086

Crowdsourcing Piedmontese to Test LLMs on Non-Standard Orthography

Authors: Gianluca Vico
Jindřich Libovický
Publication date: 8 February 2026
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)

Abstract

This dataset contains data for testing machine translation and topic classification in Piedmontese. It is based on FLORES+ (NLLB Team et al., 2024) and SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects (Adelani et al., EACL 2024)

Similar works

Full text

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

oai:lindat.mff.cuni.cz:11372/L...

Last time updated on 24/02/2026

This paper was published in LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.