279 research outputs found

    Text documents clustering using modified multi-verse optimizer

    Get PDF
    In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the final results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce significant results in comparison with three well-established methods

    Robust Optimization Model for Twitter Sentiment Analysis of PeduliLindungi Application

    Get PDF
    Technological advances during the COVID-19 pandemic in Indonesia gave rise to the PeduliLindungi application which is developed by the government to prevent the spread of COVID-19. The advantages and disadvantages of developing PeduliLindungi can be seen from the responses and opinions from users, one of which is through the Twitter. A person's opinion about PeduliLindungi based on the tweet can be classified into positive, negative, or neutral categories using a Machine Learning approach with the Support Vector Machine (SVM) algorithm. In this paper, multiobjective optimization modeling is used to maximize the performance metrics, which are the value of Accuracy, Precision, Recall, and F1-Score. The value of the performance metrics is considered to contain uncertainty factors. Therefore, the optimization problem is solved by using Robust Optimization to handle the uncertainty factor. The data uncertainty is assumed to be belongs to polyhedral uncertainty set thus the resulted robust is computationally tractable. Numerical experiment is presented to complete the discussion

    Hybrid fuzzy multi-objective particle swarm optimization for taxonomy extraction

    Get PDF
    Ontology learning refers to an automatic extraction of ontology to produce the ontology learning layer cake which consists of five kinds of output: terms, concepts, taxonomy relations, non-taxonomy relations and axioms. Term extraction is a prerequisite for all aspects of ontology learning. It is the automatic mining of complete terms from the input document. Another important part of ontology is taxonomy, or the hierarchy of concepts. It presents a tree view of the ontology and shows the inheritance between subconcepts and superconcepts. In this research, two methods were proposed for improving the performance of the extraction result. The first method uses particle swarm optimization in order to optimize the weights of features. The advantage of particle swarm optimization is that it can calculate and adjust the weight of each feature according to the appropriate value, and here it is used to improve the performance of term and taxonomy extraction. The second method uses a hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems that ensures that the membership functions and fuzzy system rule sets are optimized. The advantage of using a fuzzy system is that the imprecise and uncertain values of feature weights can be tolerated during the extraction process. This method is used to improve the performance of taxonomy extraction. In the term extraction experiment, five extracted features were used for each term from the document. These features were represented by feature vectors consisting of domain relevance, domain consensus, term cohesion, first occurrence and length of noun phrase. For taxonomy extraction, matching Hearst lexico-syntactic patterns in documents and the web, and hypernym information form WordNet were used as the features that represent each pair of terms from the texts. These two proposed methods are evaluated using a dataset that contains documents about tourism. For term extraction, the proposed method is compared with benchmark algorithms such as Term Frequency Inverse Document Frequency, Weirdness, Glossary Extraction and Term Extractor, using the precision performance evaluation measurement. For taxonomy extraction, the proposed methods are compared with benchmark methods of Feature-based and weighting by Support Vector Machine using the f-measure, precision and recall performance evaluation measurements. For the first method, the experiment results concluded that implementing particle swarm optimization in order to optimize the feature weights in terms and taxonomy extraction leads to improved accuracy of extraction result compared to the benchmark algorithms. For the second method, the results concluded that the hybrid technique that uses multi-objective particle swarm optimization and fuzzy systems leads to improved performance of taxonomy extraction results when compared to the benchmark methods, while adjusting the fuzzy membership function and keeping the number of fuzzy rules to a minimum number with a high degree of accuracy

    Hybrid harmony search algorithm for continuous optimization problems

    Get PDF
    Harmony Search (HS) algorithm has been extensively adopted in the literature to address optimization problems in many different fields, such as industrial design, civil engineering, electrical and mechanical engineering problems. In order to ensure its search performance, HS requires extensive tuning of its four parameters control namely harmony memory size (HMS), harmony memory consideration rate (HMCR), pitch adjustment rate (PAR), and bandwidth (BW). However, tuning process is often cumbersome and is problem dependent. Furthermore, there is no one size fits all problems. Additionally, despite many useful works, HS and its variant still suffer from weak exploitation which can lead to poor convergence problem. Addressing these aforementioned issues, this thesis proposes to augment HS with adaptive tuning using Grey Wolf Optimizer (GWO). Meanwhile, to enhance its exploitation, this thesis also proposes to adopt a new variant of the opposition-based learning technique (OBL). Taken together, the proposed hybrid algorithm, called IHS-GWO, aims to address continuous optimization problems. The IHS-GWO is evaluated using two standard benchmarking sets and two real-world optimization problems. The first benchmarking set consists of 24 classical benchmark unimodal and multimodal functions whilst the second benchmark set contains 30 state-of-the-art benchmark functions from the Congress on Evolutionary Computation (CEC). The two real-world optimization problems involved the three-bar truss and spring design. Statistical analysis using Wilcoxon rank-sum and Friedman of IHS-GWO’s results with recent HS variants and other metaheuristic demonstrate superior performance

    An Improved Binary Grey-Wolf Optimizer with Simulated Annealing for Feature Selection

    Get PDF
    This paper proposes improvements to the binary grey-wolf optimizer (BGWO) to solve the feature selection (FS) problem associated with high data dimensionality, irrelevant, noisy, and redundant data that will then allow machine learning algorithms to attain better classification/clustering accuracy in less training time. We propose three variants of BGWO in addition to the standard variant, applying different transfer functions to tackle the FS problem. Because BGWO generates continuous values and FS needs discrete values, a number of V-shaped, S-shaped, and U-shaped transfer functions were investigated for incorporation with BGWO to convert their continuous values to binary. After investigation, we note that the performance of BGWO is affected by the selection of the transfer function. Then, in the first variant, we look to reduce the local minima problem by integrating an exploration capability to update the position of the grey wolf randomly within the search space with a certain probability; this variant was abbreviated as IBGWO. Consequently, a novel mutation strategy is proposed to select a number of the worst grey wolves in the population which are updated toward the best solution and randomly within the search space based on a certain probability to determine if the update is either toward the best or randomly. The number of the worst grey wolf selected by this strategy is linearly increased with the iteration. Finally, this strategy is combined with IBGWO to produce the second variant of BGWO that was abbreviated as LIBGWO. In the last variant, simulated annealing (SA) was integrated with LIBGWO to search around the best-so-far solution at the end of each iteration in order to identify better solutions. The performance of the proposed variants was validated on 32 datasets taken from the UCI repository and compared with six wrapper feature selection methods. The experiments show the superiority of the proposed improved variants in producing better classification accuracy than the other selected wrapper feature selection algorithms

    Feature Selection for Document Classification : Case Study of Meta-heuristic Intelligence and Traditional Approaches

    Get PDF
    Doctor of Philosophy (Computer Engineering), 2020Nowadays, the culture for accessing news around the world is changed from paper to electronic format and the rate of publication for newspapers and magazines on website are increased dramatically. Meanwhile, text feature selection for the automatic document classification (ADC) is becoming a big challenge because of the unstructured nature of text feature, which is called “multi-dimension feature problem”. On the other hand, various powerful schemes dealing with text feature selection are being developed continuously nowadays, but there still exists a research gap for “optimization of feature selection problem (OFSP)”, which can be looked for the global optimal features. Meanwhile, the capacity of meta-heuristic intelligence for knowledge discovery process (KDP) is also become the critical role to overcome NP-hard problem of OFSP by providing effective performance and efficient computation time. Therefore, the idea of meta-heuristic based approach for optimization of feature selection is proposed in this research to search the global optimal features for ADC. In this thesis, case study of meta-heuristic intelligence and traditional approaches for feature selection optimization process in document classification is observed. It includes eleven meta-heuristic algorithms such as Ant Colony search, Artificial Bee Colony search, Bat search, Cuckoo search, Evolutionary search, Elephant search, Firefly search, Flower search, Genetic search, Rhinoceros search, and Wolf search, for searching the optimal feature subset for document classification. Then, the results of proposed model are compared with three traditional search algorithms like Best First search (BFS), Greedy Stepwise (GS), and Ranker search (RS). In addition, the framework of data mining is applied. It involves data preprocessing, feature engineering, building learning model and evaluating the performance of proposed meta-heuristic intelligence-based feature selection using various performance and computation complexity evaluation schemes. In data processing, tokenization, stop-words handling, stemming and lemmatizing, and normalization are applied. In feature engineering process, n-gram TF-IDF feature extraction is used for implementing feature vector and both filter and wrapper approach are applied for observing different cases. In addition, three different classifiers like J48, Naïve Bayes, and Support Vector Machine, are used for building the document classification model. According to the results, the proposed system can reduce the number of selected features dramatically that can deteriorate learning model performance. In addition, the selected global subset features can yield better performance than traditional search according to single objective function of proposed model

    Recent Trends in Computational Intelligence

    Get PDF
    Traditional models struggle to cope with complexity, noise, and the existence of a changing environment, while Computational Intelligence (CI) offers solutions to complicated problems as well as reverse problems. The main feature of CI is adaptability, spanning the fields of machine learning and computational neuroscience. CI also comprises biologically-inspired technologies such as the intellect of swarm as part of evolutionary computation and encompassing wider areas such as image processing, data collection, and natural language processing. This book aims to discuss the usage of CI for optimal solving of various applications proving its wide reach and relevance. Bounding of optimization methods and data mining strategies make a strong and reliable prediction tool for handling real-life applications

    Review of data mining applications for quality assessment in manufacturing industry: Support Vector Machines

    Get PDF
    In many modern manufacturing industries, data that characterize the manufacturing process are electronically collected and stored in the databases. Due to advances in data collection systems and analysis tools, data mining (DM) has widely been applied for quality assessment (QA) in manufacturing industries. In DM, the choice of technique to use in analyzing a dataset and assessing the quality depend on the understanding of the analyst. On the other hand, with the advent of improved and efficient prediction techniques, there is a need for an analyst to know which tool performs best for a particular type of data set. Although a few review papers have recently been published to discuss DM applications in manufacturing for QA, this paper provides an extensive review to investigate the application of a special DM technique, namely support vector machine (SVM) to solve QA problems. The review provides a comprehensive analysis of the literature from various points of view as DM preliminaries, data preprocessing, DM applications for each quality task, SVM preliminaries, and application results. Summary tables and figures are also provided besides to the analyses. Finally, conclusions and future research directions are provided
    corecore