1,413 research outputs found

    An efficient Particle Swarm Optimization approach to cluster short texts

    Full text link
    This is the author’s version of a work that was accepted for publication in Information Sciencies. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, VOL 265, MAY 1 2014 DOI 10.1016/j.ins.2013.12.010.Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size. (C) 2013 Elsevier Inc. All rights reserved.The research work is partially funded by the European Commission as part of the WIQ-EI IRSES research project (Grant No. 269180) within the FP 7 Marie Curie People Framework and it has been developed in the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. The research work of the first author is partially funded by the program PAID-02-10 2257 (Universitat Politecnica de Valencia) and CONICET (Argentina).Cagnina, L.; Errecalde, M.; Ingaramo, D.; Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. https://doi.org/10.1016/j.ins.2013.12.010S364926

    An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text

    Get PDF
    Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods

    An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis

    Get PDF
    Nowadays, there are many types of digital materials that can be used in the classroom. Students and scholars are migrating from textbooks to digital study materials because textbooks are too large and expensive. Teachers and college students can use and modify the materials that are freely available or with some constraints for their learning and teaching. E-content can be designed, evolved, utilized, re-used, and distributed electronically from anywhere at anytime. Because of the flexibility of time, place, and speed of learning, e-content is becoming extremely popular. It can be readily and instantly shared and communicated with an infinite number of clients all across the globe. Document clustering is most commonly used to group documents that are related to a specific topic. Text document clustering can be used to group a collection of documents regarding the information they include and to deliver search results when a user searches the internet. In this paper mainly focuses on text document clustering to cope with massive collection of E-Content documents. Enhanced Expectation Maximization Text Document Clustering (EEMTDC) clustering algorithm was proposed and compared with Expectation Maximization (EM) clustering, K-Means clustering, and Hierarchical clustering (HC) algorithms. The experiment shows that the performance of proposed EEMTDC algorithm produces greater clustering accuracy than existing clustering algorithms

    Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

    Full text link
    This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm

    An Ensemble Classification and Hybrid Feature Selection Approach for Fake News Stance Detection

    Get PDF
    The developments in Internet and notions of social media have revolutionised representations and disseminations of news. News spreads quickly while costing less in social media. Amidst these quick distributions, dangerous or seductive information like user generated false news also spread equally. on social media. Distinguishing true incidents from false news strips create key challenges. Prior to sending the feature vectors to the classifier, it was suggested in this study effort to use dimensionality reduction approaches to do so. These methods would not significantly affect the result, though. Furthermore, utilising dimensionality reduction techniques significantly reduces the time needed to complete a forecast. This paper presents a hybrid feature selection method to overcome the above mentioned issues. The classifications of fake news are based on ensembles which identify connections between stories and headlines of news items. Initially, data is pre-processed to transform unstructured data into structures for ease of processing. In the second step, unidentified qualities of false news from diverse connections amongst news articles are extracted utilising PCA (Principal Component Analysis). For the feature reduction procedure, the third step uses FPSO (Fuzzy Particle Swarm Optimization) to select features. To efficiently understand how news items are represented and spot bogus news, this study creates ELMs (Ensemble Learning Models). This study obtained a dataset from Kaggle to create the reasoning. In this study, four assessment metrics have been used to evaluate performances of classifying models

    An adaptive clustering and classification algorithm for Twitter data streaming in Apache Spark

    Get PDF
    On-going big data from social networks sites alike Twitter or Facebook has been an entrancing hotspot for investigation by researchers in current decades as a result of various aspects including up-to-date-ness, accessibility and popularity; however anyway there may be a trade off in accuracy. Moreover, clustering of twitter data has caught the attention of researchers. As such, an algorithm which can cluster data within a lesser computational time, especially for data streaming is needed. The presented adaptive clustering and classification algorithm is used for data streaming in Apache spark to overcome the existing problems is processed in two phases. In the first phase, the input pre-processed twitter data is viably clustered utilizing an Improved Fuzzy C-means clustering and the proposed clustering is additionally improved by an Adaptive Particle swarm optimization (PSO) algorithm. Further the clustered data streaming is assessed utilizing spark engine. In the second phase, the input pre-processed Higgs data is classified utilizing the modified support vector machine (MSVM) classifier with grid search optimization. At long last the optimized information is assessed in spark engine and the assessed esteem is utilized to discover an accomplished confusion matrix. The proposed work is utilizing Twitter dataset and Higgs dataset for the data streaming in Apache Spark. The computational examinations exhibit the superiority ofpresented approach comparing with the existing methods in terms of precision, recall, F-score, convergence, ROC curve and accuracy

    Word-level Textual Adversarial Attacking as Combinatorial Optimization

    Full text link
    Adversarial attacks are carried out to reveal the vulnerability of deep neural networks. Textual adversarial attacking is challenging because text is discrete and a small perturbation can bring significant change to the original input. Word-level attacking, which can be regarded as a combinatorial optimization problem, is a well-studied class of textual attack methods. However, existing word-level attack models are far from perfect, largely because unsuitable search space reduction methods and inefficient optimization algorithms are employed. In this paper, we propose a novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately. We conduct exhaustive experiments to evaluate our attack model by attacking BiLSTM and BERT on three benchmark datasets. Experimental results demonstrate that our model consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods. Also, further experiments show our model has higher transferability and can bring more robustness enhancement to victim models by adversarial training. All the code and data of this paper can be obtained on https://github.com/thunlp/SememePSO-Attack.Comment: Accepted at ACL 2020 as a long paper (a typo is corrected as compared with the official conference camera-ready version). 16 pages, 3 figure

    A Collection of Challenging Optimization Problems in Science, Engineering and Economics

    Full text link
    Function optimization and finding simultaneous solutions of a system of nonlinear equations (SNE) are two closely related and important optimization problems. However, unlike in the case of function optimization in which one is required to find the global minimum and sometimes local minima, a database of challenging SNEs where one is required to find stationary points (extrama and saddle points) is not readily available. In this article, we initiate building such a database of important SNE (which also includes related function optimization problems), arising from Science, Engineering and Economics. After providing a short review of the most commonly used mathematical and computational approaches to find solutions of such systems, we provide a preliminary list of challenging problems by writing the Mathematical formulation down, briefly explaning the origin and importance of the problem and giving a short account on the currently known results, for each of the problems. We anticipate that this database will not only help benchmarking novel numerical methods for solving SNEs and function optimization problems but also will help advancing the corresponding research areas.Comment: Accepted as an invited contribution to the special session on Evolutionary Computation for Nonlinear Equation Systems at the 2015 IEEE Congress on Evolutionary Computation (at Sendai International Center, Sendai, Japan, from 25th to 28th May, 2015.
    • …
    corecore