1,421 research outputs found

    A Review on K-means Clustering Based on Quantum Particle Swarm Optimisation Algorithm

    Get PDF
    Unsupervised learning clustering techniques play a vital role in data mining, with a wide range of applications in unsupervised classification. Clustering is a method used to categorise data into meaningful groups. The k-means algorithm is a well-known clustering algorithm that aims to minimise the squared distance between feature values of points within the same cluster. In many applications, using an evolutionary computation technique called Quantum Particle Swarm Optimization (QPSO) in conjunction with the k-means algorithm has proven effective in finding suboptimal solutions. In this algorithm, the cluster centres are simulated as particles, allowing for the identification of suitable and stable cluster centres. This paper discusses the current improvement in the QPSO-k-means clustering algorithm, focusing on swarm initialisation and algorithm parameter optimisation. We validate the algorithm using the UCI healthcare dataset and demonstrate its ability to address suboptimal clustering by optimising parameters such as the number of iterations, error rate, and optimal solution for cluster centres. The minimisation factor of the validation parameter indicates the compactness and validity of the clustering algorithm

    Improving Ant Collaborative Filtering on Sparsity via Dimension Reduction

    Get PDF
    Recommender systems should be able to handle highly sparse training data that continues to change over time. Among the many solutions, Ant Colony Optimization, as a kind of optimization algorithm modeled on the actions of an ant colony, enjoys the favorable characteristic of being optimal, which has not been easily achieved by other kinds of algorithms. A recent work adopting genetic optimization proposes a collaborative filtering scheme: Ant Collaborative Filtering (ACF), which models the pheromone of ants for a recommender system in two ways: (1) use the pheromone exchange to model the ratings given by users with respect to items; (2) use the evaporation of existing pheromone to model the evolution of users’ preference change over time. This mechanism helps to identify the users and the items most related, even in the case of sparsity, and can capture the drift of user preferences over time. However, it reveals that many users share the same preference over items, which means it is not necessary to initialize each user with a unique type of pheromone, as was done with the ACF. Regarding the sparsity problem, this work takes one step further to improve the Ant Collaborative Filtering’s performance by adding a clustering step in the initialization phase to reduce the dimension of the rate matrix, which leads to the results that K<<#users, where K is the number of clusters, which stands for the maximum number of types of pheromone carried by all users. We call this revised version the Improved Ant Collaborative Filtering (IACF). Experiments are conducted on larger datasets, compared with the previous work, based on three typical recommender systems: (1) movie recommendations, (2) music recommendations, and (3) book recommendations. For movie recommendation, a larger dataset, MoviesLens 10M, was used, instead of MoviesLens 1M. For book recommendation and music recommendation, we used a new dataset that has a much larger size of samples from Douban and NetEase. The results illustrate that our IACF algorithm can better deal with practical recommendation scenarios that handle sparse dataset

    Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensemble: A Survey

    Get PDF
    Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods

    An efficient Particle Swarm Optimization approach to cluster short texts

    Full text link
    This is the author’s version of a work that was accepted for publication in Information Sciencies. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, VOL 265, MAY 1 2014 DOI 10.1016/j.ins.2013.12.010.Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size. (C) 2013 Elsevier Inc. All rights reserved.The research work is partially funded by the European Commission as part of the WIQ-EI IRSES research project (Grant No. 269180) within the FP 7 Marie Curie People Framework and it has been developed in the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. The research work of the first author is partially funded by the program PAID-02-10 2257 (Universitat Politecnica de Valencia) and CONICET (Argentina).Cagnina, L.; Errecalde, M.; Ingaramo, D.; Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. https://doi.org/10.1016/j.ins.2013.12.010S364926

    Partial Discharge Spectral Characterization in HF, VHF and UHF Bands Using Particle Swarm Optimization

    Get PDF
    The measurement of partial discharge (PD) signals in the radio frequency (RF) range has gained popularity among utilities and specialized monitoring companies in recent years. Unfortunately, in most of the occasions the data are hidden by noise and coupled interferences that hinder their interpretation and renders them useless especially in acquisition systems in the ultra high frequency (UHF) band where the signals of interest are weak. This paper is focused on a method that uses a selective spectral signal characterization to feature each signal, type of partial discharge or interferences/noise, with the power contained in the most representative frequency bands. The technique can be considered as a dimensionality reduction problem where all the energy information contained in the frequency components is condensed in a reduced number of UHF or high frequency (HF) and very high frequency (VHF) bands. In general, dimensionality reduction methods make the interpretation of results a difficult task because the inherent physical nature of the signal is lost in the process. The proposed selective spectral characterization is a preprocessing tool that facilitates further main processing. The starting point is a clustering of signals that could form the core of a PD monitoring system. Therefore, the dimensionality reduction technique should discover the best frequency bands to enhance the affinity between signals in the same cluster and the differences between signals in different clusters. This is done maximizing the minimum Mahalanobis distance between clusters using particle swarm optimization (PSO). The tool is tested with three sets of experimental signals to demonstrate its capabilities in separating noise and PDs with low signal-to-noise ratio and separating different types of partial discharges measured in the UHF and HF/VHF bands.The work done in this paper has been funded by the Spanish Government (MINECO) and the European Regional Development Fund (ERDF) under contract DPI2015-66478-C2-1-R (MINECO/FEDER, UE)

    A state-of-art optimization method for analyzing the tweets of earthquake-prone region

    Get PDF
    With the increase in accumulated data and usage of the Internet, social media such as Twitter has become a fundamental tool to access all kinds of information. Therefore, it can be expressed that processing, preparing data, and eliminating unnecessary information on Twitter gains its importance rapidly. In particular, it is very important to analyze the information and make it available in emergencies such as disasters. In the proposed study, an earthquake with the magnitude of Mw = 6.8 on the Richter scale that occurred on January 24, 2020, in Elazig province, Turkey, is analyzed in detail. Tweets under twelve hashtags are clustered separately by utilizing the Social Spider Optimization (SSO) algorithm with some modifications. The sum-of intra-cluster distances (SICD) is utilized to measure the performance of the proposed clustering algorithm. In addition, SICD, which works in a way of assigning a new solution to its nearest node, is used as an integer programming model to be solved with the GUROBI package program on the test data-sets. Optimal results are gathered and compared with the proposed SSO results. In the study, center tweets with optimal results are found by utilizing modified SSO. Moreover, results of the proposed SSO algorithm are compared with the K-means clustering technique which is the most popular clustering technique. The proposed SSO algorithm gives better results. Hereby, the general situation of society after an earthquake is deduced to provide moral and material supports

    Hoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization

    Get PDF
    Currently, the spread of hoax news has increased significantly, especially on social media networks. Hoax news is very dangerous and can provoke readers. So, this requires special handling. This research proposed a hoax news detection system using searching, snippet and cosine similarity methods to classify hoax news. This method is proposed because the searching method does not require training data, so it is practical to use and always up to date. In addition, one of the drawbacks of the existing approaches is they are not equipped with a sentiment analysis feature. In our system, sentiment analysis is carried out after hoax news is detected. The goal is to extract the true hidden sentiment inside hoax whether positive sentiment or negative sentiment. In the process of sentiment analysis, the NaĂŻve Bayes (NB) method was used which was optimized using the Particle Swarm Optimization (PSO) method. Based on the results of experiment on 30 hoax news samples that are widely spread on social media networks, the average of hoax news detection reaches 77% of accuracy, where each news is correctly identified as a hoax in the range between 66% and 91% of accuracy. In addition, the proposed sentiment analysis method proved to has a better performance than the previous analysis sentiment method

    An adaptive algorithm for clustering cumulative probability distribution functions using the Kolmogorov–Smirnov two-sample test

    Get PDF
    This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the Kolmogorov–Smirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov–Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270,000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.This research has been partially supported by the Spanish Consejería de Economía, Innovación y Ciencia of the Junta de Andalucía under projects TIC-6441 and P11-RNM7115, and the Spanish MEC under project ECO2011–29751
    • …
    corecore