5 research outputs found
Venetan to English machine translation: issues and possible solutions
In this paper we describe a prototype of a Venetan to English
translation system developed under the STILVEN project financed by the Regional
Authorities of Veneto Region in Italy. The general approach is a
statistical one with some preprocessing operations both at training and
translation time (ortographic normalization and POS tagging to make
use of factored models) which are needed especially to overcome two
main problems: the scarcity of Venetan resources (our Venetan-English
corpus is made up of only 13,000 sentences, amounting to 128,000 Venetan
tokens excluding punctuation) and the diasystemic nature of Venetan,
which really represents an ensemble of varieties rather than a single
dialect. We will present in detail the problems related to Venetan, our
ideas to solve them, their implementation and the results obtained so
far
Arabic Morphology Parsing Revisited
In this paper we propose a new approach to the description of Arabic
morphology using 2-tape finite state transducers, based on a particular and
systematic use of the operation of composition in a way that allows for
incremental substitutions of concatenated lexical morpheme specifications with
their surface realization for non-concatenative processes (the case of Arabic
templatic interdigitation and non-templatic circumfixation)
Sarrif – The Elegant Arabic Morphology Parser
In this paper we present Sarrif, our Arabic Morphology Parser, featuring a novel approach to the description of Arabic morphology with 2-tape finite state transducers, based on a particular and systematic use of the operation of composition in a way that allows for incremental substitutions of concatenated lexical morpheme specifications with their surface realization for non-concatenative
processes (the case of Arabic templatic interdigitation and non-templatic circumfixation). We argue that:
1. the method of incremental substitutions through compositions allows for an elegant description of all main morphological processes present in natural languages including non-concatenative ones in strict finite-state terms, without the need to resort to extensions of any sort;
2. our approach allows for the most logical encoding of every kind of dependency, including traditional long-distance ones (mutual exclusiveness), circumfixations and idiosyncratic root and pattern combinations;
3. a smart usage of composition such as ours allows for the creation of a same system that can be easily accomodated to fulfil the duties of both a stemmer (or lexicon development tool) and a full-fledged lexical transducer
Addicter 2.0
Addicter stands for Automatic Detection and DIsplay of Common Translation ERrors. It is a set of tools (mostly scripts written in Perl) that help with error analysis for machine translation. The second version contains a greatly improved viewer and many new modules for automatic detection and classification of errors