Search CORE

32 research outputs found

Web cluster load balancing via genetic-fuzzy based algorithm

Author: Chin Wen Cheong
Lim Amy Hui Lan
Publication venue: 'UUM Press, Universiti Utara Malaysia'
Publication date: 01/01/2007
Field of study

In this genetic-fuzzy based Generalized Dimension Exchange (GDE) method is proposed to uniformly distribute the unprecedented Web cluster workload. Fuzzy set theory is used to capture the vagueness of the workload during redistribution period. Fuzzy set theory is used to capture the vagueness of the workload during redistribution period. According to the experts’ subjective evaluations, a fuzzy inference system is established to aggregate the fuzzy web performance metrics into a so-called load-weight index which indicates the servers workload intensity. Based on the load-weight index, the genetic-fuzzy algorithm is applied to equally redistribute the workload among in the servers. Finally, a simulation of 20 load-weight indices in a topology of 3-cube form Web cluster is implemented to illustrate the functionality of the proposed method

UUM Repository

Predicting Factors that Affect East Asian Studentsâ€™ Reading Proficiency in PISA

Author: Chua Fang-Fang
Lim Amy Hui-Lan
Low Adeline Hui-Min
Publication venue: Society of Visual Informatics
Publication date: 01/01/2023
Field of study

Teachers, schools, and parents contribute to equipping students with essential knowledge and skills during their education years. When students are approaching the end of their education, they are randomly selected to participate in Program for International Student Assessment (PISA) to assess their reading proficiency. Existing work on analyzing PISA achievement results concentrates solely on identifying factors related to Parent or in combination with Student. Limited work has been proposed on how factors related to Teacher and School affect the studentsâ€™ reading proficiency in PISA. This study focuses on identifying the factors related to Teacher and/or School that affect East Asian studentsâ€™ reading proficiency in PISA. The PISA achievement results from East Asian students are chosen as the domain study because they are consistently the top performers in PISA in the past decade. Decision Tree (DT), NaÃ¯ve Bayes (NB), K-Nearest Neighbors (KNN) and Random Forest (RF) are compared. Hamming score is used as the evaluation metric. The results indicate that RF produces the best predictive models with highest Hamming score of 0.8427. Based on the findings, School-related factors such as the number of schoolâ€™s disciplinary cases, size of the school, the availability of computers with Internet facilities, the quality and educational qualifications of teachers have higher impact on the PISA achievement results. The identified factors can be used as a reference in assessing the current schoolâ€™s teaching, learning environment, and organizing extra activities as part of intervention programs to cultivate reading habits and enhance reading abilities among students

JOIV : International Journal on Informatics Visualization

SHDL@MMU Digital Repository

Design And Development Of GP-Based Data Mining Systems

Author: Lim Amy Hui Lan
Publication venue
Publication date: 01/01/2003
Field of study

Initially, function using genetic programming (GP) is investigated through symbolic regression for data mining applications. Various kinds of functions are investigated including function learning tasks as well as Boolean functions learning. The objective of the initial investigation is to review how GP is applied to function learning tasks. The drawbacks of this method are identified. Hybrid GP technique based on genetic algorithm-program (GA-P) for function learning tasks with variables and constants is investigated. This hybrid GP technique is further expanded to new hybrid GA/SA-p. The new hybrid GA/SA-P combining genetic algorithms (GA) and simulated annealing (SA) is proposed for function learning tasks with numeric constants. The convergence bahaviour will be compared with existing GP and genetic algorithm-program (GP-P). Application of Gp is extended to discover interesting rules among data sets. Given a set of data and appropriate parameter settings, GP is used to discover set of rules that describes the relationships that exist among the data. Finally, GP is investigated as decision tree classifier for classifying binary and multiclass classification problems. The simulation results will be compared with C4.5 decision tree algorithm

SHDL@MMU Digital Repository

A new model for managerial decision making using workflow management

Author: Lim Amy Hui Lan
Publication venue
Publication date: 01/08/2012
Field of study

Today’s dynamic organization that does variety of businesses requires two main components to be able to sustain itself in the market. Firstly, it requires a reusable and customizable business performance framework to evaluate the organization periodically. Secondly, an organization requires a way to evaluate and manage workflows which serve as backbone of any organization. This study focuses on designing business performance framework, modeling and applying data mining to workflows to improve organizational performance. Four main problems are identified. Firstly, a comprehensive and reusable business performance measurement framework is required to measure organization’s performance. Secondly, an improved workflow management systems to support design phase of business process management (BPM) lifecycle is required to produce effective and efficient workflow designs. Thirdly, existing evaluation methods in workflow management systems should be enhanced with online analytical processing capabilities (OLAP) to support decision-making. Fourthly, OLAP should be improved to support analysis of current workflows in diagnosis phase of BPM lifecycle

SHDL@MMU Digital Repository

Design And Development Of GP-Based Data Mining Systems

Author: Lim Amy Hui Lan
Publication venue
Publication date: 01/01/2003
Field of study

Initially, function using genetic programming (GP) is investigated through symbolic regression for data mining applications. Various kinds of functions are investigated including function learning tasks as well as Boolean functions learning. The objective of the initial investigation is to review how GP is applied to function learning tasks. The drawbacks of this method are identified. Hybrid GP technique based on genetic algorithm-program (GA-P) for function learning tasks with variables and constants is investigated. This hybrid GP technique is further expanded to new hybrid GA SA-p. The new hybrid GA SA-P combining genetic algorithms (GA) and simulated annealing (SA) is proposed for function learning tasks with numeric constants. The convergence bahaviour will be compared with existing GP and genetic algorithm-program (GP-P). Application of Gp is extended to discover interesting rules among data sets. Given a set of data and appropriate parameter settings, GP is used to discover set of rules that describes the relationships that exist among the data. Finally, GP is investigated as decision tree classifier for classifying binary and multiclass classification problems. The simulation results will be compared with C4.5 decision tree algorithm

SHDL@MMU Digital Repository

Hybrid Deep Neural Networks for Industrial Text Scoring

Author: Goh Hui Ngo
Lim Amy Hui Lan
Nagappan Sidharrth
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Academic scoring is mainly explored through the pedagogical fields of Automated Essay Scoring (AES) and Short Answer Scoring (SAS), but text scoring in other domains has received limited attention. This paper focuses on industrial text scoring, namely the processing and adherence checking of long annual reports based on regulatory requirements. To lay the foundations for non-academic scoring, a pioneering corpus of annual reports from companies is scraped, segmented into sections, and domain experts score relevant sections based on adherence. Subsequently, deep neural non-hierarchical attention-based LSTMs, hierarchical attention networks and longformer-based models are refined and evaluated. Since the longformer outperformed LSTM-based models, we embed it into a hybrid scoring framework that employs lexicon and named entity features, with rubric injection via word-level attention, culminating in a Kappa score of 0.9670 and 0.820 in both our corpora, respectively. Though scoring is fundamentally subjective, our proposed models show significant results when navigating thin rubric boundaries and handling adversarial responses. As our work proposes a novel industrial text scoring engine, we hope to validate our framework using more official documentation based on a broader range of regulatory practices

SHDL@MMU Digital Repository

Forum Text Processing and Summarization

Author: Goh Hui Ngo
Lim Amy Hui Lan
Mak Yen Wei
Publication venue: Politeknik Negeri Padang
Publication date: 01/01/2024
Field of study

Frequently Asked Questions (FAQs) are extensively studied in general domains like the medical field, but such frameworks are lacking in domains such as software engineering and open-source communities. This research aims to bridge this gap by establishing the foundations of an automated FAQ Generation and Retrieval framework specifically tailored to the software engineering domain. The framework involves analyzing, ranking, performing sentiment analysis, and summarization techniques on open forums like StackOverflow and GitHub issues. A corpus of Stack Overflow post data is collected to evaluate the proposed framework and the selected models. Integrating state-of-the-art models of string-matching models, sentiment analysis models, summarization models, and the proprietary ranking formula proposed in this paper forms a robust Automatic FAQ Generation and Retrieval framework to facilitate developers' work. String matching, sentiment analysis, and summarization models are evaluated, and F1 scores of 71.31%, 74.90%, and 53.4% were achieved. Given the subjective nature of evaluations in this context, a human review is used to further validate the effectiveness of the overall framework, with assessments made on relevancy, preferred ranking, and preferred summarization. Future work includes improving summarization models by incorporating text classification and summarizing them individually (Kou et al, 2023), as well as proposing feedback loop systems based on human reinforcement learning. Furthermore, efforts will be made to optimize the framework by utilizing knowledge graphs for dimension reduction, enabling it to handle larger corpora effectivel

SHDL@MMU Digital Repository

Construction of Part of Speech Tagger for Malay Language: A Review

Author: Goh Hui Ngo
Lim Amy Hui Lan
Mohamad Ali Nurulhuda
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Part-of-Speech (POS) Tagging is one of the fundamental tasks in Natural Language Processing (NLP) in analyzing human languages. It is a process of identifying how words are used in a sentence by assigning the proper POS for each word. Thus far, most well-researched POS tagging is on European languages which are considered rich-resource languages due to the unlimited linguistic resources such as research studies and large standard corpus. However, POS tagging is arduous for lowresource languages due to the limitation of linguistic resources. The Malay language is considered as a low-resource language. Most POS tagging studies for the Malay language are using rulebased and stochastic methods. However, exploration in Deep Learning (DL) for Malay language is limited. Thus, studies with POS tagging methods that implement DL for other low-resource languages within South East Asia are included in this study. Hence, the aim of this study is to identify the state of the art, challenges, and future works of Malay POS tagger. This study provides a review of different methods, datasets, and performance measures used in POS tagging studies

SHDL@MMU Digital Repository

Detecting At-Risk and Withdrawal Students in STEM and Social Science Courses using Predictive and Association Rules Mining

Author: Goh Hui Ngo
Lim Amy Hui Lan
Suhaimi Muhd Syazwan Aqrimi
Publication venue: Beijing Jiaotong University, China
Publication date: 01/01/2022
Field of study

This research aims to identify potential at-risk and withdrawal students to help these students in their studies. Interactions consisting of surfing behaviour in the Virtual Learning Environment (VLE) among two different groups of students namely disabled and non-disabled students for Social Science and STEM courses are analysed. Predictive analytics and association rule mining (ARM) analysis are performed. Predictive analytics is performed to predict students’ likelihood of withdrawing from their registered courses. Among the students who choose to pursue their registered courses, predictive analytics is also used to predict at-risk students. Six predictive algorithms namely Decision Tree (DT), Logistic Regression (LR), Naive Bayes (NB), K Nearest Neighbour (KNN), Random Forest (RF), and Support Vector Machine (SVM) are compared. FPGrowth algorithm is applied in ARM analysis. Predictive results show that DT is superior with the accuracy scores reaching 0.91. Most association rules are positively correlated, and they represent the set of commonly surfed pages by the potential at-risk and withdrawal students. The predictive results can help VLE developer to determine the possible algorithms to be used in the intelligent VLE to make accurate predictions based on students’ interactions in the VLE. The results from ARM analysis prove that FP-Growth can also be included in the intelligent VLE. The intelligent VLE can assist the relevant staff in an education institution to provide timely and personalized support to students who are struggling in their studies. This research contributes to precision education through learning analytics

SHDL@MMU Digital Repository