Search CORE

2 research outputs found

Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

Author: A Roberts
A Schalley
Chris Fox
D Radev
D Wang
E Benmamoun
E Lloret
F Diehl
G Giannakopoulos
H Luhn
I Foster
I Hmeidi
J Yeh
K Dukes
L Abouenour
L Al-Sulaiti
M Baroni
M Diab
M Fattah
M Outahajala
M Poesio
M Sawalha
Mahmoud El-Haj
Udo Kruschwitz
W Banzhaf
Y Benajiba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented

University of Essex Research Repository

University of Regensburg Publication Server

Crossref

Lancaster E-Prints

Improving Arabic Texts Morphological Disambiguation using Possibilistic Classifier (NLDB 2014)

Author: A.A. Al-Echikh
B. Haouari
D. Dubois
D. Dubois
D.J. Dubois
I. Bounhas
J. Hajic
K. Jbara
M. Bounhas
M. Georgescul
M. Outahajala
R. Ayed
R. Quinlan
S. Alkuhlani
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceMorphological ambiguity is an important problem that has been studied through different approaches. We investigate, in this paper, some classification methods to disambiguate Arabic morphological features of non-vocalized texts. A possibilistic approach is improved and proposed to handle imperfect training and test datasets. We introduce a data transformation method to convert the imperfect dataset to a perfect one. We compare the disambiguation results of classification approaches to results given by the possibilistic classifier dealing with imperfection context

Crossref

Scientific Publications of the University of Toulouse II Le Mirail