Search CORE

75 research outputs found

Generating a Malay sentiment lexicon based on wordnet

Author: Nazlia Omar
Nur Sharmini Alexander
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2017
Field of study

Sentiment lexicon is a list of vocabularies that consists of positive and negative words. In opinion mining, sentiment lexicon is one of the important source in text polarity classification task in sentiment analysis model. Studies in Malay sentiment analysis is increasing since the volume of sentiment data is growing on social media. Therefore, requirement in Malay sentiment lexicon is high. However, Malay sentiment lexicon development is a difficult task due to the scarcity of Malay language resource. Thus, various approaches and techniques are used to generate sentiment lexicon. The objective of this paper is to develop Malay sentiment lexicon generation algorithm based on WordNet. In this study, the method is to map the WordNet Bahasa with English WordNet to get the offset value of a seed set of sentiment words. The seed set is used to generate the synonym and antonym semantic relation in English WordNet. The highest result achives 86.58% agreement with human annotators and 91.31% F1-measure in word polarity classification. The result shows the effectiveness of the proposed algorithm to generate Malay sentiment lexicon based on WordNet

Directory of Open Access Journals

UKM Journal Article Repository

Transforming noun phrase structure form into heuristics and rules for detecting compound noun in Malay sentence

Author: Ab. Rahman Suhaimi
Omar Nazlia
Publication venue
Publication date: 04/07/2012
Field of study

This paper addresses the process of transforming noun phrase structure form into a list of suitable heuristic used for detecting compound noun word in Malay sentence.The heuristic is used to obtain the syntax sentence structure for finding a compound noun pair of words in Malay sentence. To obtain the list of these rules, the noun phrase structure form must be created first, so that we know the possibility of the words' combination as a compound noun.The noun phrase structure form is grouped based on three different noun categories, such as i) noun and noun ii) noun and noun modifier, and iii) noun and non-noun modifier.However, in our research work, we focus on the category of noun and noun modifier.The heuristic rules and noun phrase structure form are important to understand because they help to clarify the concept of finding compound noun pair of words in Malay sentence.This compound noun output will use an input in our next research named head modifier detector

UUM Repository

Ramalan harga Bitcoin berasaskan polariti sentiment artikel berita dan data pasaran dengan model LSTM

Author: Chee Kean Chin
Nazlia Omar
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2020
Field of study

Bitcoin adalah wang digital dan alat pelaburan yang telah mendapat perhatian seluruh dunia sejak kebelakangan ini. Namun, harga Bitcoin yang tidak stabil telah menjadi kebimbangan di kalangan pengguna dan pelabur Bitcoin. Ramalan harga Bitcoin dapat membantu pelabur dan pengguna untuk membina strategi yang efektif dalam pelaburan atau penggunaan. Dengan perkembangan pesat Internet, data dalam talian termasuk artikel berita boleh membantu dalam harga ramalan Bitcoin. Kajian ini bertujuan untuk mengkaji kesan sentimen artikel berita kepada harga Bitcoin dengan tempoh kajian dari September 2017 hingga Ogos 2019. Sehubungan dengan itu, kajian ini memperkenalkan analisis sentimen untuk memahami maklumat artikel berita dalam talian dan menggunakannya sebagai fitur input untuk ramalan harga Bitcoin. Terdapat dua fasa utama dalam kajian ini, iaitu analisis sentimen dan ramalan harga. Dalam analisis sentimen, sentimen diekstrak berdasarkan kaedah leksikon untuk memahami maklumat artikel berita berkaitan dengan pasaran kriptowang. Kriptowang adalah sejenis sistem pembayaran digital dan monetari yang mana transaksi dilakukan dengan cara desentralisasi yang merupakan transaksi kewangan rakan-ke-rakan tanpa melalui institusi kewangan. Dengan kata lain, Bitcoin tidak bergantung kepada perantara pihak ketiga untuk memproses pembayaran, ia menggunakan bukti kriptografi dalam komputer untuk memproses dan mengesahkan kesahihan dan menyebarkan antara rangkaian (Nakamoto 2008). Dalam ramalan harga, sentimen digunakan sebagai fitur input dan model Memori Jangka Panjang Pendek (LSTM) digunakan dalam fasa ramalan harga. Dengan data pasaran dan artikel berita sebagai sampel, keputusan menunjukkan sentimen artikel berita dapat mengurangkan kesilapan dalam ramalan harga Bitcoin

UKM Journal Article Repository

Heuristics-based method for head and modifier detection in Malay sentences from the cultural heritage domain

Author: Nazlia Omar
Suhaimi Ab Rahman
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2017
Field of study

The process of detection for the head and modifier in Malay sentences from the cultural heritage domain is difficult to identify. This is due to the position of head and modifier which varies in sentences depending on the sentence structures. Hence, there are different point of views about the theory and concept of detection for the head and modifier in a compound noun that have been discussed by language experts. Additionally, the existing research is also limited especially in the areas of computational linguistics. Therefore, research should be conducted to identify appropriate methods especially used in the detection of head and modifier which appear in Malay setences from the cultural heritage domain. The aim of this study is to construct a list of heuristic rules to be used for detecting the position of compound nouns in Malay sentences from cultural heritage domain. By using 15 rules, the position of head and modifier that exist in a compound noun can also be detected. These rules are called heuristic rules. The purpose of formulating these 15 rules is to detect the head and modifier that exist in the Malay sentences from the cultural heritage domain. To measure the accuracy of the results, precision, recall and F1-score values are used. Based on the results of the experiments, Sentence Structure of Malay Cultural Heritage Domain (SADWBM) have an F1-score of 80.4% compared to Noun Phrase Structure (SFN) which is 56%. Consequently, SADWBM shows better scores compared to SFN. Therefore it is clear that the approach used in this study is effective in resolving the identified problems

Directory of Open Access Journals

UKM Journal Article Repository

Arabic nested noun compound extraction based on linguistic features and statistical measures

Author: Nazlia Omar
Qasem Al-Tashi
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/05/2018
Field of study

The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%, 88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively

Crossref

UKM Journal Article Repository

A comparative study of the ensemble and base classifiers performance in Malay text categorization

Author: Alshalabi Hamood Ali
Nazlia Omar
Sabrina Tiun
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/12/2017
Field of study

Automatic text categorization (ATC) has attracted the attention of the research community over the last decade as it frees organizations from the need of manually organized documents. The ensemble techniques, which combine the results of a number of individually trained base classifiers, always improve classification performance better than base classifiers. This paper intends to compare the effectiveness of ensemble with that of base classifiers for Malay text classification. Two feature selection methods (the Gini Index (GI) and Chi-square) with the ensemble methods are applied to examine Malay text classification, with the intention to efficiently integrate base classifiers algorithms into a more accurate classification procedure. Two types of ensemble methods, namely the voting combination and meta-classifier combination, are evaluated. A wide range of comparative experiments are conducted to assess classified Malay dataset. The applied experiments reveal that meta-classifier ensemble framework performed better than the best individual classifiers on the tested datasets

Directory of Open Access Journals

UKM Journal Article Repository

Arabic Text Classification using K-Nearest Neighbour Algorithm

Author: Nazlia Omar
Roiss Alhutaish
Publication venue
Publication date: 01/05/2020
Field of study

Abstract: Many algorithms have been implemented to the problem of Automatic Text Categorization (AT

CiteSeerX

Automatically generating a sentiment lexicon for the Malay language

Author: Mohammad Darwich
Nazlia Omar
Shahrul Azman Mohd Noah
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2016
Field of study

This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a foundation for further research for the Malay language in this area

Directory of Open Access Journals

UKM Journal Article Repository