Crowdsourcing Piedmontese to Test LLMs on Non-Standard Orthography

Abstract

This dataset contains data for testing machine translation and topic classification in Piedmontese. It is based on FLORES+ (NLLB Team et al., 2024) and SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects (Adelani et al., EACL 2024)

Similar works

Full text

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.