75 research outputs found
Generating a Malay sentiment lexicon based on wordnet
Sentiment lexicon is a list of vocabularies that consists of positive and negative words. In opinion mining, sentiment lexicon is one of the important source in text polarity classification task in sentiment analysis model. Studies in Malay sentiment analysis is increasing since the volume of sentiment data is growing on social media. Therefore, requirement in Malay sentiment lexicon is high. However, Malay sentiment lexicon development is a difficult task due to the scarcity of Malay language resource. Thus, various approaches and techniques are used to generate sentiment lexicon. The objective of this paper is to develop Malay sentiment lexicon generation algorithm based on WordNet. In this study, the method is to map the WordNet Bahasa with English WordNet to get the offset value of a seed set of sentiment words. The seed set is used to generate the synonym and antonym semantic relation in English WordNet. The highest result achives 86.58% agreement with human annotators and 91.31% F1-measure in word polarity classification. The result shows the effectiveness of the proposed algorithm to generate Malay sentiment lexicon based on WordNet
Transforming noun phrase structure form into heuristics and rules for detecting compound noun in Malay sentence
This paper addresses the process of transforming noun phrase structure form into a list of suitable heuristic used for detecting compound noun word in Malay sentence.The heuristic is used to obtain
the syntax sentence structure for finding a
compound noun pair of words in Malay sentence.
To obtain the list of these rules, the noun phrase structure form must be created first, so that we know the possibility of the words' combination as a compound noun.The noun phrase structure form is grouped based on three different noun categories, such as i) noun and noun ii) noun and noun modifier, and iii) noun and non-noun modifier.However, in our research work, we focus on the category of noun and noun modifier.The heuristic rules and noun phrase structure form are important to understand because they help to clarify the concept of finding compound noun pair of words in Malay sentence.This compound noun output will use an input in our next research named head modifier detector
Ramalan harga Bitcoin berasaskan polariti sentiment artikel berita dan data pasaran dengan model LSTM
Bitcoin adalah wang digital dan alat pelaburan yang telah mendapat perhatian seluruh dunia
sejak kebelakangan ini. Namun, harga Bitcoin yang tidak stabil telah menjadi kebimbangan di
kalangan pengguna dan pelabur Bitcoin. Ramalan harga Bitcoin dapat membantu pelabur dan
pengguna untuk membina strategi yang efektif dalam pelaburan atau penggunaan. Dengan
perkembangan pesat Internet, data dalam talian termasuk artikel berita boleh membantu dalam
harga ramalan Bitcoin. Kajian ini bertujuan untuk mengkaji kesan sentimen artikel berita
kepada harga Bitcoin dengan tempoh kajian dari September 2017 hingga Ogos 2019.
Sehubungan dengan itu, kajian ini memperkenalkan analisis sentimen untuk memahami
maklumat artikel berita dalam talian dan menggunakannya sebagai fitur input untuk ramalan
harga Bitcoin. Terdapat dua fasa utama dalam kajian ini, iaitu analisis sentimen dan ramalan
harga. Dalam analisis sentimen, sentimen diekstrak berdasarkan kaedah leksikon untuk
memahami maklumat artikel berita berkaitan dengan pasaran kriptowang. Kriptowang adalah
sejenis sistem pembayaran digital dan monetari yang mana transaksi dilakukan dengan cara
desentralisasi yang merupakan transaksi kewangan rakan-ke-rakan tanpa melalui institusi
kewangan. Dengan kata lain, Bitcoin tidak bergantung kepada perantara pihak ketiga untuk
memproses pembayaran, ia menggunakan bukti kriptografi dalam komputer untuk memproses
dan mengesahkan kesahihan dan menyebarkan antara rangkaian (Nakamoto 2008). Dalam
ramalan harga, sentimen digunakan sebagai fitur input dan model Memori Jangka Panjang
Pendek (LSTM) digunakan dalam fasa ramalan harga. Dengan data pasaran dan artikel berita sebagai sampel, keputusan menunjukkan sentimen artikel berita dapat mengurangkan kesilapan
dalam ramalan harga Bitcoin
Heuristics-based method for head and modifier detection in Malay sentences from the cultural heritage domain
The process of detection for the head and modifier in Malay sentences from the cultural heritage domain is difficult to identify. This is due to the position of head and modifier which varies in sentences depending on the sentence structures. Hence, there are different point of views about the theory and concept of detection for the head and modifier in a compound noun that have been discussed by language experts. Additionally, the existing research is also limited especially in the areas of computational linguistics. Therefore, research should be conducted to identify appropriate methods especially used in the detection of head and modifier which appear in Malay setences from the cultural heritage domain. The aim of this study is to construct a list of heuristic rules to be used for detecting the position of compound nouns in Malay sentences from cultural heritage domain. By using 15 rules, the position of head and modifier that exist in a compound noun can also be detected. These rules are called heuristic rules. The purpose of formulating these 15 rules is to detect the head and modifier that exist in the Malay sentences from the cultural heritage domain. To measure the accuracy of the results, precision, recall and F1-score values are used. Based on the results of the experiments, Sentence Structure of Malay Cultural Heritage Domain (SADWBM) have an F1-score of 80.4% compared to Noun Phrase Structure (SFN) which is 56%. Consequently, SADWBM shows better scores compared to SFN. Therefore it is clear that the approach used in this study is effective in resolving the identified problems
Arabic nested noun compound extraction based on linguistic features and statistical measures
The extraction of Arabic nested noun compound is significant for several research areas such
as sentiment analysis, text summarization, word categorization, grammar checker, and
machine translation. Much research has studied the extraction of Arabic noun compound
using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the
existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound.
Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due
to the morphological, orthographic, syntactic and semantic variations. Many features have an
important effect on the efficiency of extracting a noun compound such as unit-hood,
contextual information, and term-hood. Hence, there is a need to improve the effectiveness of
the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic
approach and a statistical method with a view to enhance the extraction of the Arabic nested
noun compound. A number of pre-processing phases are presented, including transformation,
tokenization, and normalisation. The linguistic approaches that have been used in this study
consist of a part-of-speech tagging and the named entities pattern, whereas the proposed
statistical methods that have been used in this study consist of the NC-value, NTC-value,
NLC-value, and the combination of these association measures. The proposed methods have
demonstrated that the combined association measures have outperformed the NLC-value,
NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%,
88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively
A comparative study of the ensemble and base classifiers performance in Malay text categorization
Automatic text categorization (ATC) has attracted the attention of the research community over the last decade as it frees organizations from the need of manually organized documents. The ensemble techniques, which combine the results of a number of individually trained base classifiers, always improve classification performance better than base classifiers. This paper intends to compare the effectiveness of ensemble with that of base classifiers for Malay text classification. Two feature selection methods (the Gini Index (GI) and Chi-square) with the ensemble methods are applied to examine Malay text classification, with the intention to efficiently integrate base classifiers algorithms into a more accurate classification procedure. Two types of ensemble methods, namely the voting combination and meta-classifier combination, are evaluated. A wide range of comparative experiments are conducted to assess classified Malay dataset. The applied experiments reveal that meta-classifier ensemble framework performed better than the best individual classifiers on the tested datasets
Arabic Text Classification using K-Nearest Neighbour Algorithm
Abstract: Many algorithms have been implemented to the problem of Automatic Text Categorization (AT
Automatically generating a sentiment lexicon for the Malay language
This paper aims to propose an automated sentiment lexicon generation model specifically designed for the Malay
language. Lexicon-based Sentiment Analysis (SA) models make use of a sentiment lexicon for SA tasks, which is
a linguistic resource that comprises a priori information about the sentiment properties of words. A sentiment
lexicon is an indispensable resource for SA tasks. This is evident in the emergence of a large volume of research
focused on the development of sentiment lexicon generation algorithms. This is not the case for low-resource
languages such as Malay, for which there is a lack of research focused on this particular area. This has brought up
the motivation to propose a sentiment lexicon generation algorithm for this language. WordNet Bahasa was first
mapped onto the English WordNet to construct a multilingual word network. A seed set of prototypical positive
and negative terms was then automatically expanded by recursively adding terms linked via WordNet’s synonymy
and antonymy semantic relations. The underlying intuition is that the sentiment properties of newly added terms
via these relations are preserved. A supervised classifier was employed for the word-polarity tagging task, with
textual representations of the expanded seed set as features. Evaluation of the model against the General Inquirer
lexicon as a benchmark demonstrates that it performs with reasonable accuracy. This paper aims to provide a
foundation for further research for the Malay language in this area
- …