49 research outputs found

    A Short Survey on Data Clustering Algorithms

    Full text link
    With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

    Multi-learner based recursive supervised training

    Get PDF
    In this paper, we propose the Multi-Learner Based Recursive Supervised Training (MLRT) algorithm which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of the dataset, and utilize it to train the data accurately and efficiently. We observed that empirically, MLRT performs considerably well as compared to RPHP and other systems on benchmark data with 11% improvement in accuracy on the SPAM dataset and comparable performances on the VOWEL and the TWO-SPIRAL problems. In addition, for most datasets, the time taken by MLRT is considerably lower than the other systems with comparable accuracy. Two heuristic versions, MLRT-2 and MLRT-3 are also introduced to improve the efficiency in the system, and to make it more scalable for future updates. The performance in these versions is similar to the original MLRT system

    Robust Methods for Soft Clustering of Multidimensional Time Series

    Get PDF
    Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.[Abstract] Three robust algorithms for clustering multidimensional time series from the perspective of underlying processes are proposed. The methods are robust extensions of a fuzzy C-means model based on estimates of the quantile cross-spectral density. Robustness to the presence of anomalous elements is achieved by using the so-called metric, noise and trimmed approaches. Analyses from a wide simulation study indicate that the algorithms are substantially effective in coping with the presence of outlying series, clearly outperforming alternative procedures. The usefulness of the suggested methods is also highlighted by means of a specific application.This research has been supported by MINECO (MTM2017-82724-R and PID2020-113578RB-100), the Xunta de Galicia (ED431C-2020-14), and “CITIC” (ED431G 2019/01).Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

    Possibilistic and fuzzy clustering methods for robust analysis of non-precise data

    Get PDF
    This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and real-life data

    [[alternative]]Particle Swarm Optimization Algorithm and Application to Clustering Analysis

    Get PDF
    計畫編號:NSC92-2213-E032-027研究期間:200308~200407研究經費:416,000[[sponsorship]]行政院國家科學委員

    LIVIA - um software para classificação não supervisionada de áreas foliares infectadas pela ferrugem do café.

    Get PDF
    O objetivo deste comunicado é apresentar a implementação JavaTM do software LIVIA (Library for Visual Image Analysis). Trata-se de um módulo de processamento de imagens digitais aplicado à agricultura, desenvolvido na Embrapa Informática Agropecuária (Campinas/SP), sob demanda da Embrapa Meio Ambiente (Jaguariúna/SP).bitstream/item/11836/1/comtec87.pd

    Robust Fuzzy Clustering via Trimming and Constraints

    Get PDF
    Producción CientíficaA methodology for robust fuzzy clustering is proposed. This methodology can be widely applied in very different statistical problems given that it is based on probability likelihoods. Robustness is achieved by trimming a fixed proportion of “most outlying” observations which are indeed self-determined by the data set at hand. Constraints on the clusters’ scatters are also needed to get mathematically well-defined problems and to avoid the detection of non-interesting spurious clusters. The main lines for computationally feasible algorithms are provided and some simple guidelines about how to choose tuning parameters are briefly outlined. The proposed methodology is illustrated through two applications. The first one is aimed at heterogeneously clustering under multivariate normal assumptions and the second one migh be useful in fuzzy clusterwise linear regression problems.Ministerio de Economía, Industria y Competitividad (MTM2014-56235-C2-1-P)Junta de Castilla y León (programa de apoyo a proyectos de investigación – Ref. VA212U13

    Метод оценки кластерной структуры и кластеризации данных

    Get PDF
    В статье рассматривается проблема разработки методов кластеризации, которые являются устойчивыми к инициализации (количество кластеров и начальные параметры кластеров), к различным по объему кластерам, к выбросам в данных. Предлагается метод оценки кластерной структуры и кластеризации данных, который основан на расчете значений близости объектов данных в многомерном признаковом пространстве. Метод является устойчивым к инициализации параметров кластеризации, к выбросам в данных и позволяет определять кластерную структуру и количество кластеров в ходе самоорганизации объектов данных.У статті розглядається проблема розробки методів кластеризації, які є стійкими до ініціалізації (кількість кластерів і початкові параметри кластерів), до різних за об’ємом кластерів, до викидів в даних. Пропонується метод оцінки кластерної структури і кластеризації даних, який заснований на розрахунку значень близькості об’єктів даних в багатовимірному ознаковому просторі. Метод є стійким до ініціалізації параметрів кластеризації, до викидів в даних і дозволяє визначати кластерну структуру і кількість кластерів в ході самоорганізації об’єктів даних.The paper is devoted to the problem of development of the clustering methods, which are robust to initialization (number of clusters and initial cluster parameters), to the different cluster volumes, to the outliers. It is proposed a method for estimation of cluster structure and clustering of data, based on the evaluation of similarity measure between data objects in multidimensional space. The proposed method is robust to initialization of clustering parameters, to outliers and allows definition of cluster structure and number of clusters in the data self-organizing process

    Fuzzy clustering with entropy regularization for interval-valued data with an application to scientific journal citations

    Get PDF
    In recent years, the research of statistical methods to analyze complex structures of data has increased. In particular, a lot of attention has been focused on the interval-valued data. In a classical cluster analysis framework, an interesting line of research has focused on the clustering of interval-valued data based on fuzzy approaches. Following the partitioning around medoids fuzzy approach research line, a new fuzzy clustering model for interval-valued data is suggested. In particular, we propose a new model based on the use of the entropy as a regularization function in the fuzzy clustering criterion. The model uses a robust weighted dissimilarity measure to smooth noisy data and weigh the center and radius components of the interval-valued data, respectively. To show the good performances of the proposed clustering model, we provide a simulation study and an application to the clustering of scientific journals in research evaluation
    corecore