Search CORE

79 research outputs found

Methods for Amharic part-of-speech tagging

Author: Argaw Atelach Alemu
Asker Lars
Gambäck Björn
Olsson Fredrik
Publication venue
Publication date: 01/01/2009
Field of study

The paper describes a set of experiments involving the application of three state-of- the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng- lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy ap- proach, while HMM-based and SVM- based taggers got comparable results

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

African language technology: the data-driven perspective

Author: De Pauw Guy
de Schryver Gilles-Maurice
Publication venue: 'European Academy of Applied and Social Sciences (EURAASS)'
Publication date: 01/01/2009
Field of study

Resource-light Bantu part-of-speech tagging

Author: De Pauw Guy
de Schryver Gilles-Maurice
van de Loo Janneke
Publication venue: European Language Resources Association
Publication date: 01/01/2012
Field of study

State-of-the-art software to support intelligent lexicography

Author: de Schryver Gilles-Maurice
Publication venue: 中国社会科学 = China Sociale Wetenschappen Publishing House
Publication date: 01/01/2010
Field of study

The SAWA corpus: a parallel corpus English-Swahili

Author: De Pauw Guy
de Schryver Gilles-Maurice
Wagacha Peter W
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2009
Field of study

Research in data-driven methods for Machine Translation has greatly benefited from the increasing availability of parallel corpora. Processing the same text in two different languages yields useful information on how words and phrases are translated from a source language into a target language. To investigate this, a parallel corpus is typically aligned by linking linguistic tokens in the source language to the corresponding units in the target language. An aligned parallel corpus therefore facilitates the automatic development of a machine translation system and can also bootstrap annotation through projection. In this paper, we describe data collection and annotation efforts and preliminary experimental results with a parallel corpus English- Swahili.

CiteSeerX