Search CORE

4 research outputs found

A word sense disambiguation corpus for Urdu

Author: Nawab Rao Muhammad Adeel
Rayson Paul
Saeed Ali
Stevenson Mark
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2019
Field of study

The aim of word sense disambiguation (WSD) is to correctly identify the meaning of a word in context. All natural languages exhibit word sense ambiguities and these are often hard to resolve automatically. Consequently WSD is considered an important problem in natural language processing (NLP). Standard evaluation resources are needed to develop, evaluate and compare WSD methods. A range of initiatives have lead to the development of benchmark WSD corpora for a wide range of languages from various language families. However, there is a lack of benchmark WSD corpora for South Asian languages including Urdu, despite there being over 300 million Urdu speakers and a large amounts of Urdu digital text available online. To address that gap, this study describes a novel benchmark corpus for the Urdu Lexical Sample WSD task. This corpus contains 50 target words (30 nouns, 11 adjectives, and 9 verbs). A standard, manually crafted dictionary called Urdu Lughat is used as a sense inventory. Four baseline WSD approaches were applied to the corpus. The results show that the best performance was obtained using a simple Bag of Words approach. To encourage NLP research on the Urdu language the corpus is freely available to the research community

Lancaster E-Prints

A word sense disambiguation corpus for Urdu

Author: A Daud
A McEnery
A Naseer
AI Arieff
Ali Saeed
BD Prasad
E McKean
H Schütze
J Jiang
JP Gee
M Abid
M Anand Kumar
M Sharjeel
M Sokolova
Mark Stevenson
N Mishra
NS Altman
P Edmonds
Paul Rayson
R Lior
R Navigli
Rao Muhammad Adeel Nawab
S Landes
SN Khan
SZ Arif
T Sreeganesh
UD Board
WN Francis
WS McCulloch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/09/2019
Field of study

Crossref

White Rose Research Online

Investigating the universality of a semantic web-upper ontology in the context of the African languages

Author: Anderson Winston Noël
Publication venue
Publication date: 01/08/2016
Field of study

Ontologies are foundational to, and upper ontologies provide semantic integration across, the Semantic Web. Multilingualism has been shown to be a key challenge to the development of the Semantic Web, and is a particular challenge to the universality requirement of upper ontologies. Universality implies a qualitative mapping from lexical ontologies, like WordNet, to an upper ontology, such as SUMO. Are a given natural language family's core concepts currently included in an existing, accepted upper ontology? Does SUMO preserve an ontological non-bias with respect to the multilingual challenge, particularly in the context of the African languages? The approach to developing WordNets mapped to shared core concepts in the non-Indo-European language families has highlighted these challenges and this is examined in a unique new context: the Southern African languages. This is achieved through a new mapping from African language core concepts to SUMO. It is shown that SUMO has no signi ficant natural language ontology bias.ComputingM. Sc. (Computer Science

Unisa Institutional Repository