45 research outputs found

    Taxonomy learning from Malay texts using artificial immune system based clustering

    Get PDF
    In taxonomy learning from texts, the extracted features that are used to describe the context of a term usually are erroneous and sparse. Various attempts to overcome data sparseness and noise have been made using clustering algorithm such as Hierarchical Agglomerative Clustering (HAC), Bisecting K-means and Guided Agglomerative Hierarchical Clustering (GAHC). However these methods suffer low recall. Therefore, the purpose of this study is to investigate the application of two hybridized artificial immune system (AIS) in taxonomy learning from Malay text and develop a Google-based Text Miner (GTM) for feature selection to reduce data sparseness. Two novel taxonomy learning algorithms have been proposed and compared with the benchmark methods (i.e., HAC, GAHC and Bisecting K-means). The first algorithm is designed through the hybridization of GAHC and Artificial Immune Network (aiNet) called GCAINT (Guided Clustering and aiNet for Taxonomy Learning). The GCAINT algorithm exploits a Hypernym Oracle (HO) to guide the hierarchical clustering process and produce better results than the benchmark methods. However, the Malay HO introduces erroneous hypernym-hyponym pairs and affects the result. Therefore, the second novel algorithm called CLOSAT (Clonal Selection Algorithm for Taxonomy Learning) is proposed by hybridizing Clonal Selection Algorithm (CLONALG) and Bisecting k-means. CLOSAT produces the best results compared to the benchmark methods and GCAINT. In order to reduce sparseness in the obtained dataset, the GTM is proposed. However, the experimental results reveal that GTM introduces too many noises into the dataset which leads to many false positives of hypernym-hyponym pairs. The effect of different combinations of affinity measurement (i.e., Hamming, Jaccard and Rand) on the performance of the developed methods was also studied. Jaccard is found better than Hamming and Rand in measuring the similarity distance between terms. In addition, the use of Particle Swarm Optimization (PSO) for automatic parameter tuning the GCAINT and CLOSAT was also proposed. Experimental results demonstrate that in most cases, PSO-tuned CLOSAT and GCAINT produce better results compared to the benchmark methods and able to reduce data sparseness and noise in the dataset

    Understanding requirements dependency in requirements prioritization: a systematic literature review

    Get PDF
    Requirement prioritization (RP) is a crucial task in managing requirements as it determines the order of implementation and, thus, the delivery of a software system. Improper RP may cause software project failures due to over budget and schedule as well as a low-quality product. Several factors influence RP. One of which is requirements dependency. Handling inappropriate handling of requirements dependencies can lead to software development failures. If a requirement that serves as a prerequisite for other requirements is given low priority, it affects the overall project completion time. Despite its importance, little is known about requirements dependency in RP, particularly its impacts, types, and techniques. This study, therefore, aims to understand the phenomenon by analyzing the existing literature. It addresses three objectives, namely, to investigate the impacts of requirements dependency on RP, to identify different types of requirements dependency, and to discover the techniques used for requirements dependency problems in RP. To fulfill the objectives, this study adopts the Systematic Literature Review (SLR) method. Applying the SLR protocol, this study selected forty primary articles, which comprise 58% journal papers, 32% conference proceedings, and 10% book sections. The results of data synthesis indicate that requirements dependency has significant impacts on RP, and there are a number of requirements dependency types as well as techniques for addressing requirements dependency problems in RP. This research discovered various techniques employed, including the use of Graphs for RD visualization, Machine Learning for handling large-scale RP, decision making for multi-criteria handling, and optimization techniques utilizing evolutionary algorithms. The study also reveals that the existing techniques have encountered serious limitations in terms of scalability, time consumption, interdependencies of requirements, and limited types of requirement dependencies

    Self-adaptive Based Model for Ambiguity Resolution of The Linked Data Query for Big Data Analytics

    Get PDF
    Integration of heterogeneous data sources is a crucial step in big data analytics, although it creates ambiguity issues during mapping between the sources due to the variation in the query terms, data structure and granularity conflicts. However, there are limited researches on effective big data integration to address the ambiguity issue for big data analytics. This paper introduces a self-adaptive model for big data integration by exploiting the data structure during querying in order to mitigate and resolve ambiguities. An assessment of a preliminary work on the Geography and Quran dataset is reported to illustrate the feasibility of the proposed model that motivates future work such as solving complex query

    Normalization of common noisy terms in Malaysian online media

    Get PDF
    This paper proposes a normalization technique of noisy terms that occur in Malaysian micro-texts.Noisy terms are common in online messages and influence the results of activities such as text classification and information retrieval.Even though many researchers have study methods to solve this problem, few had looked into the problems using a language other than English. In this study, about 5000 noisy texts were extracted from 15000 documents that were created by the Malaysian.Normalization process was executed using specific translation rules as part or preprocessing steps in opinion mining of movie reviews.The result shows up to 5% improvement in accuracy values of opinion mining

    Semantic Web Portal in University Research Community Framework

    Get PDF
    One way overcome the weakness of semantic web to make it more user friendly is by displaying, browsing and semantically query data. In this research, we propose Semantic Web Research Community Portal at Faculty of Information Science and Technology – Universiti Kebangsaan Malaysia (FTSM RC) as the lightest platform of Semantic Web. This platform assists the users in managing the content and making visualization of relevant semantic data by applying meaningful periodically research. In such a way it will strengthen the research information related to research, publications, departments, organizations, events, and groups of researchers. Moreover, it will streamline the issuance process, making it easier for academic staff, support staff, and faculty itself to publish information of faculty and studies research information. By the end, this will provide end users with a better view of the structure of research at the university, allowing users to conduct cross-communication between faculty and study groups by using the search information

    Using Bayesian Network for Determining The Recipient of Zakat in BAZNAS Pekanbaru

    Get PDF
    Abstract—The National Amil-Zakat Agency (Baznas) in Pekanbaru has the function to collect and distribute zakat in Pekanbaru city. Baznas Pekanbaru should be able to determine Mustahik properly. Mustahik is a person eligible to receive zakat. The Baznas committee interviews and observes every Mustahik candidates to decide whom could be receive the zakat. Current Mustahik determination process could lead to be subjective assessment, due to large number of zakat recipient applicants and the complexity of rules in determining a Mustahik. Therefore, this study utilize artificial intelligence in determining Mustahik. The Bayesian Network method is appropriate to apply as an inference engine. Based on the experimental results, we found that Bayesian network produces a good accuracy 93.24% and effective to use in data set have an uneven class distribution. In addition, based on experiments by setting an alpha estimator’s values, at 0.6 to 1.0 can increase the accuracy of a Bayesian Network to 95.95%. Keywords—bayesian network, baznas pekanbaru, mustahik, zaka

    Time Series Prediction of Bitcoin Cryptocurrency Price Based on Machine Learning Approach

    Get PDF
    Over the past few years, Bitcoin has attracted the attention of numerous parties, ranging from academic researchers to institutional investors. Bitcoin is the first and most widely used cryptocurrency to date. Due to the significant volatility of the Bitcoin price and the fact that its trading method does not require a third party, it has gained great popularity since its inception in 2009 among a wide range of individuals. Given the previous difficulties in predicting the price of cryptocurrencies, this project will be developing and implementing a time series approach-based solution prediction model using machine learning algorithms which include Support Vector Machine Regression (SVR), K-Nearest Neighbor Regression (KNN), Extreme Gradient Boosting (XGBoost), and Long Short-Term Memory (LSTM) to determine the trend of bitcoin price movement, and assessing the effectiveness of the machine learning models. The data that will be used is the close prices of Bitcoin from the year 2018 up to the year 2023. The performance of the machine learning models is evaluated by comparing the results of R-squared, mean absolute error (MAE), mean squared error (RMSE), and also through a visualization graph of the original close price and predicted close price of Bitcoin in a dashboard. Among the models compared, LSTM emerged as the most accurate, followed by SVR, while XGBoost and KNN exhibited comparatively lower performance

    An evolutionary variable neighbourhood search for the unrelated parallel machine scheduling problem

    Get PDF
    This article addresses a challenging industrial problem known as the unrelated parallel machine scheduling problem (UPMSP) with sequence-dependent setup times. In UPMSP, we have a set of machines and a group of jobs. The goal is to find the optimal way to schedule jobs for execution by one of the several available machines. UPMSP has been classified as an NP-hard optimisation problem and, thus, cannot be solved by exact methods. Meta-heuristic algorithms are commonly used to find sub-optimal solutions. However, large-scale UPMSP instances pose a significant challenge to meta-heuristic algorithms. To effectively solve a large-scale UPMSP, this article introduces a two-stage evolutionary variable neighbourhood search (EVNS) methodology. The proposed EVNS integrates a variable neighbourhood search algorithm and an evolutionary descent framework in an adaptive manner. The proposed evolutionary framework is employed in the first stage. It uses a mix of crossover and mutation operators to generate diverse solutions. In the second stage, we propose an adaptive variable neighbourhood search to exploit the area around the solutions generated in the first stage. A dynamic strategy is developed to determine the switching time between these two stages. To guide the search towards promising areas, a diversity-based fitness function is proposed to explore different locations in the search landscape. We demonstrate the competitiveness of the proposed EVNS by presenting the computational results and comparisons on the 1640 UPMSP benchmark instances, which have been commonly used in the literature. The experiment results show that our EVNS obtains better results than the compared algorithms on several UPMSP instances

    Automatic Rule Generator via FP-Growth for Eye Diseases Diagnosis

    Get PDF
    The conventional approach in developing a rule-based expert system usually applies a tedious, lengthy and costly knowledge acquisition process. The acquisition process is known as the bottleneck in developing an expert system. Furthermore, manual knowledge acquisition can eventually lead to erroneous in decision-making and function ineffective when designing any expert system. Another dilemma among knowledge engineers are handing conflict of interest or high variance of inter and intrapersonal decisions among domain experts during knowledge elicitation stage. The aim of this research is to improve the acquisition of knowledge level using a data mining technique. This paper investigates the effectiveness of an association rule mining technique in generating new rules for an expert system. In this paper, FP-Growth is the machine learning technique that was used in acquiring rules from the eye disease diagnosis records collected from Sumatera Eye Center (SMEC) Hospital in Pekanbaru, Riau, Indonesia. The developed systems are tested with 17 cases. The ophthalmologists inspected the results from automatic rule generator for eye diseases diagnosis.  We found that the introduction of FP-Growth association rules into the eye disease knowledge-based systems, able to produce acceptable and promising eye diagnosing results approximately 88% of average accuracy rate. Based on the test results, we can conclude that Conjunctivitis and Presbyopia disease are the most dominant suffering in Indonesia. In conclusion, FP-growth association rules are very potential and capable of becoming an adequate automatic rules generator, but still has plenty of room for improvement in the context of eye disease diagnosing
    corecore