Search CORE

761 research outputs found

Density propagation based adaptive multi-density clustering algorithm

Author: Pang Wei
Wang Yizhang
Zhou You
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

This research was supported by the Science & Technology Development Foundation of Jilin Province (Grants Nos. 20160101259JC, 20180201045GX), the National Natural Science Foundation of China (Grants No. 61772227) and the Natural Science Foundation of Xinjiang Province (Grants No. 2015211C127). This resarch is also supported by the Engineering and Physical Sciences Research Council (EPSRC) funded project on New Industrial Systems: Manufacturing Immortality (EP/R020957/1).Peer reviewedPublisher PD

Aberdeen University Research

Heriot Watt Pure

Directory of Open Access Journals

On exploring data lakes by finding compact, isolated clusters

Author: Corchuelo Gil Rafael
Jiménez Aguirre Patricia
Roldán Salvador Juan Carlos
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Data engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer knowledge from them. This article presents the first proposal that explores how to use a meta-heuristic to address the problem of multi-way single-subspace automatic clustering, which is very appropriate in the context of data lakes. It was confronted with five strong competitors that combine the state-of-the-art attribute selection proposal with three classical single-way clustering proposals, a recent quantum-inspired one, and a recent deep-learning one. The evaluation focused on explor ing their ability to find compact and isolated clusterings as well as the extent to which such clusterings can be considered good classifications. The statistical analyses conducted on the experimental results prove that it ranks the first regarding effectiveness using six stan dard coefficients and it is very efficient in terms of CPU time, not to mention that it did not result in any degraded clusterings or timeouts. Summing up: this proposal contributes to the array of techniques that data engineers can use to explore their data lakesMinisterio de Economía y Competitividad TIN2016-75394-RMinisterio de Ciencia e Innovación PID2020-112540RB-C44Junta de Andalucía P18-RT-1060Junta de Andalucía US-138137

idUS. Depósito de Investigación Universidad de Sevilla

Adaptive firefly algorithm for hierarchical text clustering

Author: Mohammed Athraa Jasim
Publication venue
Publication date: 01/01/2016
Field of study

Text clustering is essentially used by search engines to increase the recall and precision in information retrieval. As search engine operates on Internet content that is constantly being updated, there is a need for a clustering algorithm that offers automatic grouping of items without prior knowledge on the collection. Existing clustering methods have problems in determining optimal number of clusters and producing compact clusters. In this research, an adaptive hierarchical text clustering algorithm is proposed based on Firefly Algorithm. The proposed Adaptive Firefly Algorithm (AFA) consists of three components: document clustering, cluster refining, and cluster merging. The first component introduces Weight-based Firefly Algorithm (WFA) that automatically identifies initial centers and their clusters for any given text collection. In order to refine the obtained clusters, a second algorithm, termed as Weight-based Firefly Algorithm with Relocate (WFAR), is proposed. Such an approach allows the relocation of a pre-assigned document into a newly created cluster. The third component, Weight-based Firefly Algorithm with Relocate and Merging (WFARM), aims to reduce the number of produced clusters by merging nonpure clusters into the pure ones. Experiments were conducted to compare the proposed algorithms against seven existing methods. The percentage of success in obtaining optimal number of clusters by AFA is 100% with purity and f-measure of 83% higher than the benchmarked methods. As for entropy measure, the AFA produced the lowest value (0.78) when compared to existing methods. The result indicates that Adaptive Firefly Algorithm can produce compact clusters. This research contributes to the text mining domain as hierarchical text clustering facilitates the indexing of documents and information retrieval processes

Universiti Utara Malaysia: UUM eTheses

Artificial Intelligence and Its Applications

Author: Orwa Jaber Housheya
Praveen Agarwal
Saeed Balochian
Vishal Bhatnagar
Yudong Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref

Improved EMD-Based Complex Prediction Model for Wind Power Forecasting

Author: Abedinia Oveis
Bagheri Mehdi
Catalão João P.S.
Lotfi Mohamed
Shafie-khah Miadreza
Sobhani Behrouz
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2020
Field of study

As a response to rapidly increasing penetration of wind power generation in modern electric power grids, accurate prediction models are crucial to deal with the associated uncertainties. Due to the highly volatile and chaotic nature of wind power, employing complex intelligent prediction tools is necessary. Accordingly, this article proposes a novel improved version of empirical mode decomposition (IEMD) to decompose wind measurements. The decomposed signal is provided as input to a hybrid forecasting model built on a bagging neural network (BaNN) combined with K-means clustering. Moreover, a new intelligent optimization method named ChB-SSO is applied to automatically tune the BaNN parameters. The performance of the proposed forecasting framework is tested using different seasonal subsets of real-world wind farm case studies (Alberta and Sotavento) through a comprehensive comparative analysis against other well-known prediction strategies. Furthermore, to analyze the effectiveness of the proposed framework, different forecast horizons have been considered in different test cases. Several error assessment criteria were used and the obtained results demonstrate the superiority of the proposed method for wind forecasting compared to other methods for all test cases.© 2020 Institute of Electrical and Electronics Engineersfi=vertaisarvioitu|en=peerReviewed

Osuva

Comparative Study On Cooperative Particle Swarm Optimization Decomposition Methods for Large-scale Optimization

Author: Clark Mitchell
Publication venue: 'Brock University Library'
Publication date: 30/03/2021
Field of study

The vast majority of real-world optimization problems can be put into the class of large-scale global optimization (LSOP). Over the past few years, an abundance of cooperative coevolutionary (CC) algorithms has been proposed to combat the challenges of LSOP’s. When CC algorithms attempt to address large scale problems, the effects of interconnected variables, known as variable dependencies, causes extreme performance degradation. Literature has extensively reviewed approaches to decomposing problems with variable dependencies connected during optimization, many times with a wide range of base optimizers used. In this thesis, we use the cooperative particle swarm optimization (CPSO) algorithm as the base optimizer and perform an extensive scalability study with a range of decomposition methods to determine ideal divide-and-conquer approaches when using a CPSO. Experimental results demonstrate that a variety of dynamic regrouping of variables, seen in the merging CPSO (MCPSO) and decomposition CPSO (DCPSO), as well varying total fitness evaluations per dimension, resulted in high-quality solutions when compared to six state-of-the-art decomposition approaches

Brock University Digital Repository

An efficient Particle Swarm Optimization approach to cluster short texts

Author: Alexandrov
Banerjee
Bin
Cagnina
Cagnina
Cheeseman
Cui
Diego Ingaramo
Eberhart
Errecalde
Fasano
Feng
Fisher
Forsati
French
French
French
Frey
Hollander
Hotho
Hu
Hu
Ingaramo
Ingaramo
Ingaramo
Jain
Karthikeyan
Karypis
Krink
Labroche
Leticia Cagnina
Linhares
MacQueen
Makagonov
Manning
Manning
Marcelo Errecalde
Paolo Rosso
Pinto
Pinto
Sahami
Shi
Shrestha
Stein
Stein
Steinberger
Tukey
Turney
van der Merwe
Xiao
Yih
Yih
Zhao
Publication venue: 'Elsevier BV'
Publication date: 01/05/2014
Field of study

This is the author’s version of a work that was accepted for publication in Information Sciencies. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Information Sciences, VOL 265, MAY 1 2014 DOI 10.1016/j.ins.2013.12.010.Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO* is an effective clustering method for short-text corpora of small and medium size. (C) 2013 Elsevier Inc. All rights reserved.The research work is partially funded by the European Commission as part of the WIQ-EI IRSES research project (Grant No. 269180) within the FP 7 Marie Curie People Framework and it has been developed in the framework of the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems. The research work of the first author is partially funded by the program PAID-02-10 2257 (Universitat Politecnica de Valencia) and CONICET (Argentina).Cagnina, L.; Errecalde, M.; Ingaramo, D.; Rosso, P. (2014). An efficient Particle Swarm Optimization approach to cluster short texts. Information Sciences. 265:36-49. https://doi.org/10.1016/j.ins.2013.12.010S364926

Crossref

CONICET Digital

RiuNet

A Survey on Soft Subspace Clustering

Author: Choi Kup-Sze
Deng Zhaohong
Jiang Yizhang
Wang Jun
Wang Shitong
Publication venue: 'Elsevier BV'
Publication date: 07/04/2016
Field of study

Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository