Search CORE

2 research outputs found

Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms

Author: Crispín Zavala-Díaz
Joaquín Pérez-Ortega
Juan Frausto Solís
Nelva Nely Almanza-Ortega
Sandra Silvia Roblero-Aguilar
Vanesa Landero-Nájera
Yasmín Hernández
Publication venue: 'MDPI AG'
Publication date: 31/07/2022
Field of study

A hybrid variant of the Fuzzy C-Means and K-Means algorithms is proposed to solve large datasets such as those presented in Big Data. The Fuzzy C-Means algorithm is sensitive to the initial values of the membership matrix. Therefore, a special configuration of the matrix can accelerate the convergence of the algorithm. In this sense, a new approach is proposed, which we call Hybrid OK-Means Fuzzy C-Means (HOFCM), and it optimizes the values of the membership matrix parameter. This approach consists of three steps: (a) generate a set of n solutions of an x dataset, applying a variant of the K-Means algorithm; (b) select the best solution as the basis for generating the optimized membership matrix; (c) resolve the x dataset with Fuzzy C-Means. The experimental results with four real datasets and one synthetic dataset show that HOFCM reduces the time by up to 93.94% compared to the average time of the standard Fuzzy C-Means. It is highlighted that the quality of the solution was reduced by 2.51% in the worst case

Multidisciplinary Digital Publishing Institute

POFCM: A Parallel Fuzzy Clustering Algorithm for Large Datasets

Author: Crispín Zavala-Díaz
César David Rey-Figueroa
Joaquín Pérez-Ortega
Nelva Nely Almanza-Ortega
Salomón García-Paredes
Sandra Silvia Roblero-Aguilar
Vanesa Landero-Nájera
Publication venue: 'MDPI AG'
Publication date: 01/04/2023
Field of study

Clustering algorithms have proven to be a useful tool to extract knowledge and support decision making by processing large volumes of data. Hard and fuzzy clustering algorithms have been used successfully to identify patterns and trends in many areas, such as finance, healthcare, and marketing. However, these algorithms significantly increase their solution time as the size of the datasets to be solved increase, making their use unfeasible. In this sense, the parallel processing of algorithms has proven to be an efficient alternative to reduce their solution time. It has been established that the parallel implementation of algorithms requires its redesign to optimise the hardware resources of the platform that will be used. In this article, we propose a new parallel implementation of the Hybrid OK-Means Fuzzy C-Means (HOFCM) algorithm, which is an efficient variant of Fuzzy C-Means, in OpenMP. An advantage of using OpenMP is its scalability. The efficiency of the implementation is compared against the HOFCM algorithm. The experimental results of processing large real and synthetic datasets show that our implementation tends to more efficiently solve instances with a large number of clusters and dimensions. Additionally, the implementation shows excellent results concerning speedup and parallel efficiency metrics. Our main contribution is a Fuzzy clustering algorithm for large datasets that is scalable and not limited to a specific domain

Directory of Open Access Journals