Search CORE

177 research outputs found

Automatically generating a sentiment lexicon for the Malay language

Author: Mohammad Darwich
Nazlia Omar
Shahrul Azman Mohd Noah
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2016
Field of study

This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a foundation for further research for the Malay language in this area

Directory of Open Access Journals

UKM Journal Article Repository

Gujarati Word Sense Disambiguation using Genetic Algorithm

Author: Zankhana B. Vaishnav
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

Genetic algorithms (GAs) have widely been investigated to solve hard optimization problems, including the word sense disambiguation (WSD). This problem asks to determine which sense of a polysemous word is used in a given context. Several approaches have been investigated for WSD in English, French, German and some Indo-Aryan languages like Hindi, Marathi, Malayalam, etc. however, research on WSD in Guajarati Language is relatively limited. In this paper, an approach for Guajarati WSD using Genetic algorithm has been proposed which uses Knowledge based approach where Indo-Aryan WordNet for Guajarati is used as lexical database for WSD

International Journal on Recent and Innovation Trends in Computing and Communication

Introducing the Arabic WordNet project

Author: Alkhalifa M.
Black W.
Fellbaum C.
Vossen P.J.T.M.
Publication venue: Amsterdam: Vrije Universiteit
Publication date: 01/01/2006
Field of study

VU Research Portal

Introducing the Arabic WordNet project

Author: Alkhalifa M.
Black W.
Elkateb S.
Fellbaum C.
Pease A.
Rodriguez H.
Vossen P.
Publication venue
Publication date: 01/01/2006
Field of study

VU Research Portal

Automatic Creation of Lexical Resources for an Interlingua-based System

Author: Bekios Juan
Boguslavsky Igor
Cardenosa Jesus
Gallardo Carolina
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

The Universal Networking Language (UNL) is an interlingua designed to be the base of several natural language processing systems aiming to support multilinguality in internet. One of the main components of the language is the dictionary of Universal Words (UWs), which links the vocabularies of the different languages involved in the project. As any NLP system, coverage and accuracy in its lexical resources are crucial for the development of the system. In this paper, the authors describes how a large coverage UWs dictionary was automatically created, based on an existent and well known resource like the English WordNet. Other aspects like implementation details and the evaluation of the final UW set are also depicted

Bulgarian Digital Mathematics Library at IMI-BAS