761 research outputs found

    Density propagation based adaptive multi-density clustering algorithm

    Get PDF
    This research was supported by the Science & Technology Development Foundation of Jilin Province (Grants Nos. 20160101259JC, 20180201045GX), the National Natural Science Foundation of China (Grants No. 61772227) and the Natural Science Foundation of Xinjiang Province (Grants No. 2015211C127). This resarch is also supported by the Engineering and Physical Sciences Research Council (EPSRC) funded project on New Industrial Systems: Manufacturing Immortality (EP/R020957/1).Peer reviewedPublisher PD

    On exploring data lakes by finding compact, isolated clusters

    Get PDF
    Data engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer knowledge from them. This article presents the first proposal that explores how to use a meta-heuristic to address the problem of multi-way single-subspace automatic clustering, which is very appropriate in the context of data lakes. It was confronted with five strong competitors that combine the state-of-the-art attribute selection proposal with three classical single-way clustering proposals, a recent quantum-inspired one, and a recent deep-learning one. The evaluation focused on explor ing their ability to find compact and isolated clusterings as well as the extent to which such clusterings can be considered good classifications. The statistical analyses conducted on the experimental results prove that it ranks the first regarding effectiveness using six stan dard coefficients and it is very efficient in terms of CPU time, not to mention that it did not result in any degraded clusterings or timeouts. Summing up: this proposal contributes to the array of techniques that data engineers can use to explore their data lakesMinisterio de Economía y Competitividad TIN2016-75394-RMinisterio de Ciencia e Innovación PID2020-112540RB-C44Junta de Andalucía P18-RT-1060Junta de Andalucía US-138137

    Adaptive firefly algorithm for hierarchical text clustering

    Get PDF
    Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

    Improved EMD-Based Complex Prediction Model for Wind Power Forecasting

    Get PDF
    As a response to rapidly increasing penetration of wind power generation in modern electric power grids, accurate prediction models are crucial to deal with the associated uncertainties. Due to the highly volatile and chaotic nature of wind power, employing complex intelligent prediction tools is necessary. Accordingly, this article proposes a novel improved version of empirical mode decomposition (IEMD) to decompose wind measurements. The decomposed signal is provided as input to a hybrid forecasting model built on a bagging neural network (BaNN) combined with K-means clustering. Moreover, a new intelligent optimization method named ChB-SSO is applied to automatically tune the BaNN parameters. The performance of the proposed forecasting framework is tested using different seasonal subsets of real-world wind farm case studies (Alberta and Sotavento) through a comprehensive comparative analysis against other well-known prediction strategies. Furthermore, to analyze the effectiveness of the proposed framework, different forecast horizons have been considered in different test cases. Several error assessment criteria were used and the obtained results demonstrate the superiority of the proposed method for wind forecasting compared to other methods for all test cases.© 2020 Institute of Electrical and Electronics Engineersfi=vertaisarvioitu|en=peerReviewed

    Comparative Study On Cooperative Particle Swarm Optimization Decomposition Methods for Large-scale Optimization

    Get PDF
    The vast majority of real-world optimization problems can be put into the class of large-scale global optimization (LSOP). Over the past few years, an abundance of cooperative coevolutionary (CC) algorithms has been proposed to combat the challenges of LSOP’s. When CC algorithms attempt to address large scale problems, the effects of interconnected variables, known as variable dependencies, causes extreme performance degradation. Literature has extensively reviewed approaches to decomposing problems with variable dependencies connected during optimization, many times with a wide range of base optimizers used. In this thesis, we use the cooperative particle swarm optimization (CPSO) algorithm as the base optimizer and perform an extensive scalability study with a range of decomposition methods to determine ideal divide-and-conquer approaches when using a CPSO. Experimental results demonstrate that a variety of dynamic regrouping of variables, seen in the merging CPSO (MCPSO) and decomposition CPSO (DCPSO), as well varying total fitness evaluations per dimension, resulted in high-quality solutions when compared to six state-of-the-art decomposition approaches

    An efficient Particle Swarm Optimization approach to cluster short texts

    Full text link
    This is the author’s version of a work that was accepted for publication in Information Sciencies. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, VOL 265, MAY 1 2014 DOI 10.1016/j.ins.2013.12.010.Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size. (C) 2013 Elsevier Inc. All rights reserved.The research work is partially funded by the European Commission as part of the WIQ-EI IRSES research project (Grant No. 269180) within the FP 7 Marie Curie People Framework and it has been developed in the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. The research work of the first author is partially funded by the program PAID-02-10 2257 (Universitat Politecnica de Valencia) and CONICET (Argentina).Cagnina, L.; Errecalde, M.; Ingaramo, D.; Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. https://doi.org/10.1016/j.ins.2013.12.010S364926

    A Survey on Soft Subspace Clustering

    Full text link
    Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201
    corecore