Venetan to English machine translation: issues and possible solutions

Abstract

In this paper we describe a prototype of a Venetan to English translation system developed under the STILVEN project financed by the Regional Authorities of Veneto Region in Italy. The general approach is a statistical one with some preprocessing operations both at training and translation time (ortographic normalization and POS tagging to make use of factored models) which are needed especially to overcome two main problems: the scarcity of Venetan resources (our Venetan-English corpus is made up of only 13,000 sentences, amounting to 128,000 Venetan tokens excluding punctuation) and the diasystemic nature of Venetan, which really represents an ensemble of varieties rather than a single dialect. We will present in detail the problems related to Venetan, our ideas to solve them, their implementation and the results obtained so far

    Similar works