3 research outputs found
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German
Swiss German is a dialect continuum whose natively acquired dialects
significantly differ from the formal variety of the language. These dialects
are mostly used for verbal communication and do not have standard orthography.
This has led to a lack of annotated datasets, rendering the use of many NLP
methods infeasible. In this paper, we introduce the first annotated parallel
corpus of spoken Swiss German across 8 major dialects, plus a Standard German
reference. Our goal has been to create and to make available a basic dataset
for employing data-driven NLP applications in Swiss German. We present our data
collection procedure in detail and validate the quality of our corpus by
conducting experiments with the recent neural models for speech synthesis
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects. The campaign was organized as part of the fifth edition of the VarDial workshop, collocated with COLING’2018. This year, the campaign included five shared tasks, including two task re-runs – Arabic Dialect Identification (ADI) and German Dialect Identification (GDI) –, and three new tasks – Morphosyntactic Tagging of Tweets (MTT), Discriminating between Dutch and Flemish in Subtitles (DFS), and Indo-Aryan Language Identification (ILI). A total of 24 teams submitted runs across the five shared tasks, and contributed 22 system description papers, which were included in the VarDial workshop proceedings and are referred to in this report.Non peer reviewe