Search CORE

86 research outputs found

Propositional Satisfiability Method in Rough Classification Modeling for Data Mining

Author: Abu Bakar Azuraliza
Publication venue
Publication date: 01/01/2002
Field of study

The fundamental problem in data mining is whether the whole information available is always necessary to represent the information system (IS). The goal of data mining is to find rules that model the world sufficiently well. These rules consist of conditions over attributes value pairs called description and classification of decision attribute. However, the set of all decision rules generated from all conditional attributes can be too large and can contain many chaotic rules that are not appropriate for unseen object classification. Therefore the search for the best rules must be performed because it is not possible to determine the quality of all rules generated from the information systems. In rough set approach to data mining, the set of interesting rules are determined using a notion of reduct. Rules were generated from reducts through binding the condition attribute values of the object class from which the reduct is originated to the corresponding attribute. It is important for the reducts to be minimum in size. The minimal reducts will decrease the size of the conditional attributes used to generate rules. Smaller size of rules are expected to classify new cases more properly because of the larger support in data and in some sense the most stable and frequently appearing reducts gives the best decision rules. The main work of the thesis is the generation of classification model that contains smaller number of rules, shorter length and good accuracy. The propositional satisfiability method in rough classification model is proposed in this thesis. Two models, Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) to represent the minimal reduct computation problem were proposed. The models involved a theoretical formalism of the discemibility relation of a decision system (DS) into an Integer Programming (IP) model. The proposed models were embedded within the default rules generation framework and a new rough classification method was obtained. An improved branch and bound strategy is proposed to solve the SIP and DRIP models that pruned certain amount of search. The proposed strategy used the conflict analysis procedure to remove the unnecessary attribute assignments and determined the branch level for the search to backtrack in a nonchronological manner. Five data sets from VCI machine learning repositories and domain theories were experimented. Total number rules generated for the best classification model is recorded where the 30% of data were used for training and 70% were kept as test data. The classification accuracy, the number of rules and the maximum length of rules obtained from the SIPIDRIP method was compared with other rough set method such as Genetic Algorithm (GA), Johnson, Holte l R, Dynamic and Exhaustive method. Four of the datasets were then chosen for further experiment. The improved search strategy implemented the non-chronological backtracking search that potentially prunes the large portion of search space. The experimental results showed that the proposed SIPIDRIP method is a successful method in rough classification modeling. The outstanding feature of this method is the reduced number of rules in all classification models. SIPIDRIP generated shorter rules among other methods in most dataset. The proposed search strategy indicated that the best performance can be achieved at the lower level or shorter path of the tree search. SIPIDRIP method had also shown promising across other commonly used classifiers such as neural network and statistical method. This model is expected to be able to represent the knowledge of the system efficiently

Universiti Putra Malaysia Institutional Repository

Data Preprocessing: Case Study on monthly number of visitors to Taiwan by their residence and purpose

Author: Bakar Azuraliza Abu
Putra Muhammad Reza
Publication venue: 'Universitas Putra Indonesia YPTK Padang'
Publication date: 14/01/2020
Field of study

This paper will explain in details on data reports preliminary on dataset, how the pre-processing data mainly for data cleaning and reduction process applied to a dataset. The dataset that will be used is number of visitors to Taiwan by their residence and purpose.Dataset which is obtained based on kaggle,  findings from Scraped from Taiwan Tourism Bureau. The surveys have been carried out using Foreign visitor data covers all foreign visitors directly arrived in Taiwan through the airports, ports and land

Lembaga Penelitian dan Pengabdian kepada Masyarakat (LPPM) Universitas Putra Indonesia YPTK Padang: Open Journal Systems

Indicator selection based on Rough Set Theory

Author: Abu Bakar Azuraliza
Ahmad Faudziah
Hamdan Abdul Razak
Publication venue
Publication date: 24/06/2009
Field of study

A method for indicator selection is proposed in this paper.The method, which adopts the General Methodology and Design Research approach, consists of four steps: Problem Identification, Requirement Gathering, Indicator Extraction, and Evaluation. Rough Set approach also has been applied in the Indicator Extraction phase.This phase consists of 5 steps: Data selection, Data Preprocessing, Discretization, Split Data, Reduction, and Classification.A dataset of 427 records have been used for experimentation.The datasets which contains financial information from several companies consists of 30 dependant indicators and one independent indicator.The selection of indicators is based on rough set theory where sets of reducts are computed from a dataset.Based on the sets of reducts, indicators have been ranked and selected based on certain set of criteria.Indicators have been ranked through computation of frequencies in reduct sets.The major contribution of this work is the extraction method for identifying reduced indicators.Results obtained have shown competitive accuracies in classifying new cases, thus showing that the quality of knowledge is maintained through the use of a reduced set of indicators

UUM Repository

Comparative Analysis of Data Mining Techniques for Malaysian Rainfall Prediction

Author: Abu Bakar Azuraliza
Jasim Dalia Sami
Zainudin Suhaila
Publication venue: 'Insight Society'
Publication date: 09/12/2016
Field of study

Climate change prediction analyses the behaviours of weather for a specific time. Rainfall forecasting is a climate change task where specific features such as humidity and wind will be used to predict rainfall in specific locations. Rainfall prediction can be achieved using classification task under Data Mining. Different techniques lead to different performances depending on rainfall data representation including representation for long term (months) patterns and short-term (daily) patterns. Selecting an appropriate technique for a specific duration of rainfall is a challenging task. This study analyses multiple classifiers such as Naïve Bayes, Support Vector Machine, Decision Tree, Neural Network and Random Forest for rainfall prediction using Malaysian data. The dataset has been collected from multiple stations in Selangor, Malaysia. Several pre-processing tasks have been applied in order to resolve missing values and eliminating noise. The experimental results show that with small training data (10%) from 1581 instances Random Forest correctly classified 1043 instances. This is the strength of an ensemble of trees in Random Forest where a group of classifiers can jointly beat a single classifier

International Journal on Advanced Science, Engineering and Information Technology

A comparative study of deep learning algorithms in univariate and multivariate forecasting of the Malaysian stock market

Author: Azuraliza Abu Bakar
Mohd. Ridzuan Ab. Khalil
Publication venue: 'Faculty of Medicine, Universiti Kebangsaan Malaysia'
Publication date: 01/01/2023
Field of study

As part of a financial institution, the stock market has been an essential factor in the growth and stability of the national economy. Investment in the stock market is risky because of its price complexity and unpredictable nature. Deep learning is an emerging approach in stock market prediction modeling that can learn the non-linearity and complexity of stock market data. To date, not much study on stock market prediction in Malaysia employs the deep learning prediction model, especially in handling univariate and multivariate data. This study aims to develop a univariate and multivariate stock market forecasting model using three deep learning algorithms and compare the performance of those models. The algorithm intends to predict the close price of the Malaysian stock market using the Axiata Group Berhad and Petronas Gas Berhad from Bursa Malaysia, listed in Kuala Lumpur Composite Index (KLCI) datasets. Three deep learning algorithms, Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM), are used to develop the prediction model. The deep learning models achieved the highest accuracy and outperformed the baseline models in short and long-term forecasts. It also shows that LSTM possessed the best deep learning model for the Malaysian stock market, achieving the lowest prediction error among the other models. Deep learning demonstrates the ability to handle univariate and multivariate data in preserving important information, thus forecasting the stock market. This finding is relatively significant as deep learning works well with high-dimensional datasets

UKM Journal Article Repository

Multi layer perception modelling in the housing market

Author: Abu Bakar Azuraliza
Ku-Mahamud Ku Ruhana
Md. Norwawi Norita
Publication venue: 'UUM Press, Universiti Utara Malaysia'
Publication date: 01/01/1999
Field of study

The study examines the use of multi layer perceptron network (MLP) in predicting the price of terrace houses in Kuala Lumpur (KL). Nine factors that significantly influence the price were used in this attempt. Housing data from 1994 to 1996 were presented to the network for training. Tested results from the model obtained for various years were compared using regression analysis. The study provides the predictive ability of the trained MLP model that can be used as an alternative predictor in real estate analysis

UUM Repository

An improved artificial dendrite cell algorithm for abnormal signal detection

Author: Abdul Wahab Mohd Helmy
Abu Bakar Azuraliza
Hamdan Abdul Razak
Mohamad Mohsin Mohamad Farhan
Publication venue: Universiti Utara Malaysia, UUM Press
Publication date: 01/01/2018
Field of study

In dendrite cell algorithm (DCA), the abnormality of a data point is determined by comparing the multi-context antigen value (MCAV) with anomaly threshold. The limitation of the existing threshold is that the value needs to be determined before mining based on previous information and the existing MCAV is inefficient when exposed to extreme values. This causes the DCA fails to detect new data points if the pattern has distinct behavior from previous information and affects detection accuracy. This paper proposed an improved anomaly threshold solution for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability. In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV. From the experiments towards 12 benchmark and two outbreak datasets, the improved DCA is proven to have a better detection result than its previous version in terms of sensitivity, specificity, false detection rate and accuracy

UUM Repository

UTHM Institutional Repository

An Affective Decision Making Engine Framework for Practical Software Agents

Author: Bakar Azuraliza Abu
Hamdan Abdul Razak
Othman Zulaiha Ali
Sarim Hafiz Mohd
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 25/11/2011
Field of study

The framework of the Affective Decision Making Engine outlined here provides a blueprint for creating software agents that emulate psychological affect when making decisions in complex and dynamic problem environments. The influence of affect on the agent's decisions is mimicked by measuring the correlation of feature values, possessed by objects and/or events in the environment, against the outcome of goals that are set for measuring the agent's overall performance. The use of correlation in the Affective Decision Making Engine provides a statistical justification for preference when prioritizing goals, particularly when it is not possible to realize all agent goals. The simplification of the agent algorithm retains the function of affect for summarizing feature-rich dynamic environments during decision making. Keywords: Affective decision making, correlative adaptation, affective agent

International Institute for Science, Technology and Education (IISTE): E-Journals

Nonlinear regression in tax evasion with uncertainty: a variational approach

Author: Ayob Masri
Bakar Azuraliza Abu
Mobasher-Kashani Mohamad
Taheri Kourosh
Tanabandeh Razieh
Tayarani Najaran Mohammad Hassan
Publication venue: Universiti Putra Malaysia Press
Publication date: 01/06/2017
Field of study

One of the major problems in today's economy is the phenomenon of tax evasion. The linear regression method is a solution to find a formula to investigate the effect of each variable in the final tax evasion rate. Since the tax evasion data in this study has a great degree of uncertainty and the relationship between variables is nonlinear, Bayesian method is used to address the uncertainty along with 6 nonlinear basis functions to tackle the nonlinearity problem. Furthermore, variational method is applied on Bayesian linear regression in tax evasion data to approximate the model evidence in Bayesian method. The dataset is collected from tax evasion in Malaysia in period from 1963 to 2013 with 8 input variables. Results from variational method are compared with Maximum Likelihood Estimation technique on Bayeisan linear regression and variational method provides more accurate prediction. This study suggests that, in order to reduce the tax evasion, Malaysian government should decrease direct tax and taxpayer income and increase indirect tax and government regulation variables by 5% in the small amount of changes (10%-30%) and reduce direct tax and income on taxpayer and increment indirect tax and government regulation variables by 90% in the large amount of changes (70%-90%) with respect to the current situation to reduce the final tax evasion rate

Enlighten