Search CORE

5 research outputs found

Venetan to English machine translation: issues and possible solutions

Author: DELMONTE R.
SUHEL JABER
TONELLI SARA
Publication venue: University of Copenhagen, Special Issue
Publication date: 01/01/2011
Field of study

In this paper we describe a prototype of a Venetan to English translation system developed under the STILVEN project financed by the Regional Authorities of Veneto Region in Italy. The general approach is a statistical one with some preprocessing operations both at training and translation time (ortographic normalization and POS tagging to make use of factored models) which are needed especially to overcome two main problems: the scarcity of Venetan resources (our Venetan-English corpus is made up of only 13,000 sentences, amounting to 128,000 Venetan tokens excluding punctuation) and the diasystemic nature of Venetan, which really represents an ensemble of varieties rather than a single dialect. We will present in detail the problems related to Venetan, our ideas to solve them, their implementation and the results obtained so far

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Arabic Morphology Parsing Revisited

Author: DELMONTE R.
JABER Suhel
Publication venue: © Springer-Verlag Berlin Heidelberg
Publication date: 01/01/2008
Field of study

In this paper we propose a new approach to the description of Arabic morphology using 2-tape finite state transducers, based on a particular and systematic use of the operation of composition in a way that allows for incremental substitutions of concatenated lexical morpheme specifications with their surface realization for non-concatenative processes (the case of Arabic templatic interdigitation and non-templatic circumfixation)

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Sarrif – The Elegant Arabic Morphology Parser

Author: DELMONTE R.
JABER Suhel
Publication venue: The MEDAR Consortium
Publication date: 01/01/2009
Field of study

In this paper we present Sarrif, our Arabic Morphology Parser, featuring a novel approach to the description of Arabic morphology with 2-tape finite state transducers, based on a particular and systematic use of the operation of composition in a way that allows for incremental substitutions of concatenated lexical morpheme specifications with their surface realization for non-concatenative processes (the case of Arabic templatic interdigitation and non-templatic circumfixation). We argue that: 1. the method of incremental substitutions through compositions allows for an elegant description of all main morphological processes present in natural languages including non-concatenative ones in strict finite-state terms, without the need to resort to extensions of any sort; 2. our approach allows for the most logical encoding of every kind of dependency, including traditional long-distance ones (mutual exclusiveness), circumfixations and idiosyncratic root and pattern combinations; 3. a smart usage of composition such as ours allows for the creation of a same system that can be easily accomodated to fulfil the duties of both a stemmer (or lexicon development tool) and a full-fledged lexical transducer

Archivio Ricerca Ca'Foscari

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Addicter 2.0

Author: Berka Jan
Bisazza Arianna
Bojar Ondřej
Fishel Mark
Hunsicker Sabine
Jaber Suhel
Popel Martin
Popović Maja
Zeman Daniel
Publication venue
Publication date: 01/01/2011
Field of study

Addicter stands for Automatic Detection and DIsplay of Common Translation ERrors. It is a set of tools (mostly scripts written in Perl) that help with error analysis for machine translation. The second version contains a greatly improved viewer and many new modules for automatic detection and classification of errors

Biblio at Institute of Formal and Applied Linguistics