14 research outputs found

    Enhanced text stemmer for standard and non-standard word patterns in Malay texts

    Get PDF
    Text stemming is a useful language preprocessing tool in the field of information retrieval, text classification and natural language processing. A text stemmer is a computer program that removes affixes, clitics and particles to obtain the root words from the derived words. Over the past few years, few text stemmers have been developed for the Malay language but unfortunately, these text stemmers suffer from various stemming errors. It is due to the difficulty in dealing with the complexity of the Malay language morphological rules. These text stemmers are developed for text stemming against affixation words only whereas there are other affixation, reduplication and compounding words in the Malay language. Furthermore, none of these text stemmers has been developed for text stemming against social media texts which comprise of the non-standard derived words. Therefore, this research study aims to improve the existing text stemmers capability of stemming affixation, reduplication and compounding words while minimising the possible stemming errors. Moreover, this research study also aims to address text stemming process for non-standard derived words on the social media platforms by removing non-standard affixes, clitics and particles. This research study adopts a multiple text stemming approach that use affix removal method and dictionary lookup in specific arrangement order to correctly stem standard and non-standard affixation, reduplication and compounding words in the standard texts and social media texts. The proposed text stemmer is evaluated against various text documents using the direct evaluation method and the text classification is used as the indirect evaluation method to validate the effectiveness of the proposed enhanced text stemmer. In general, the proposed enhanced text stemmer outperforms the baseline text stemmer. The stemming accuracy of the proposed enhanced text stemmer achieves an average of 98.7% against the standard texts and an average of 73.7% against the social media texts. Meanwhile, the performance of the proposed enhanced text stemmer in the sports news classification application achieves an average of 85% accuracy and the illicit content classification application achieves an average of 75% accuracy. Meanwhile, the baseline text stemmer achieves an average of 63.5% stemming accuracy against the standard texts but unfortunately, it is unable to stem non-standard derived words in the social media texts. The baseline text stemmer performs poorly in sports news classification and illicit content classification with an average accuracy of 78% and 63% respectively. In short, the experimental results suggest that the proposed enhanced text stemmer has promising stemming accuracy for text stemming against the standard texts and social media texts. It also influences the performance of the text classification application

    Enhanced Affixation Word Stemmer with Stemming Error Reducer to Solve Affxation Stemming Errors

    Get PDF
    Word stemming algorithm (or word stemmer) is an important preprocessing component in the information retrieval and text categorization that aims to reduce derived words to their respective root words. Most of the existing Malay word stemmers adopt rule-based affixes removal method and dictionary lookup to stem affixation words. Despite of many stemming approaches have been proposed in the past research, the existing Malay word stemmers still suffer from affixation stemming errors due to the complexity of Malay morphology. These stemming errors can be classified into over stemming, under stemming, unstem, and special variations and exceptions. Hence this paper presents the enhanced affixation word stemmer that aims to solve these stemming errors. This paper also examined the root causes of these stemming errors in the existing Malay stemmers. The experimental results indicate that the enhanced word stemmerable to stem prefixation, suffixation, confixation and infixation wordswith better stemming accuracy by using enhanced Rule Application Order and Stemming Errors Reducer

    Design Process and Hydrodynamic Analysis of Underwater Remotely Operated Crawler

    Get PDF
    Underwater Remotely Operated Crawler (ROC) is a type of underwater Remotely Operated Vehicle (ROV) that able to operate underwater and even on land. The distinctive design of the ROC compared to other underwater vehicle is, ROC allows for underwater intervention by staying direct contact with the seabed. The common issues faced by all underwater vehicles are the drag that occurs when the vehicles move underwater. It is important to reduce the drag in order to increase the speed of the ROC with less power consumption. As such, the study of hydrodynamics to the ROC is essential so that the stability and maneuverability of the ROC can be guaranteed. SolidWorks software is used to design and analyses the ROC. The dimension of the ROC is 100-mm high, 449.60-mm long and 297.60 width. The body or chassis of the ROC is made of stainless steel. Based on the design and the capability of the ROC, it is estimated that the ROC can operate with less drag, withstand the underwater forces and stable to operate on the seabed

    A Convolutional Neural Network model for Credit Card Fraud detection

    No full text
    Nowadays, online transactions through various ecommerce platforms are becoming more prevalent, and Credit Card (CC) is significantly used in various online transactions. However, Credit Card Fraud (CCF) strategies continue to evolve with the business transformation, causing customers as well as the financial institutions to lose billions of dollars annually. Hence, effective detection of fraudulent transactions initiated by fraudsters from the voluminous array of normal transactions is ever necessary. Hence, a Convolutional Neural Network (CNN) model for credit card fraud detection is proposed in this study using Adaptive Synthetic (ADASYN) sampling technique to address the imbalance dataset. The proposed model has achieved 0.9982, 0.9965, and 0.9999, accuracy, precision, and recall, respectively compared to other existing studies

    Design consideration of Malay text stemmer using structured approach

    No full text
    Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy

    Content based fraudulent website detection using supervised machine learning techniques

    No full text
    Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research

    Enhanced rules application order approach to stem reduplication words in Malay texts

    No full text
    Word stemming algorithm is a natural language morphogical process of reducing derived words to their respective root words. Due to the importance of word stemming algorithm, many Malay word stemming algorithms have been developed in the past years. However, previous researchers only focused on improving affixation word stemming with various stemming approaches. There is no reduplication word stemming has been developed for Malay language thus far. In Malay language, affixation and reduplication are derived words in which have their own morphological rules. Therefore, the use of affixation word stemming to stem reduplication words is considered inappropriate. Hence this paper presents the proposed reduplication word stemming algorithm to stem full, rhythmic and partial reduplication words to their respective root words. This proposed stemming algorithm uses Rules Application Order with Stemming Errors Reducer to stem these reduplication words. Malay online newspaper articles have been used to evaluate this proposed stemming algorithm. The experimental results showed that the proposed stemming algorithm able to stem full, rhythmic, affixed and partial reduplication with better stemming accuracy. Hence, the future improvement of Malay word stemming algorithm should include affixation and reduplication word stemming

    Malware behavior profiling from unstructured data

    No full text
    Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour

    Word stemming methods for the Malay language: a review

    No full text
    Abstract View references (29) Word stemmer is a preprocessing component that has been widely used in many artificial intelligence applications for reducing derived words to their respective root words. There are many existing Malay word stemmers that have been developed to stem affixation words by using various word stemming methods. Therefore, this paper describes the research trends of the existing Malay word stemmers based on morphological structures of Malay language, general word stemming methods and adopted word stemming methods used in the existing word stemmers. Hence, this paper serves as a preliminary reference for improving the existing Malay word stemmers
    corecore