704 research outputs found

    Fuzzy clustering of univariate and multivariate time series by genetic multiobjective optimization

    Get PDF
    Given a set of time series, it is of interest to discover subsets that share similar properties. For instance, this may be useful for identifying and estimating a single model that may fit conveniently several time series, instead of performing the usual identification and estimation steps for each one. On the other hand time series in the same cluster are related with respect to the measures assumed for cluster analysis and are suitable for building multivariate time series models. Though many approaches to clustering time series exist, in this view the most effective method seems to have to rely on choosing some features relevant for the problem at hand and seeking for clusters according to their measurements, for instance the autoregressive coe±cients, spectral measures or the eigenvectors of the covariance matrix. Some new indexes based on goodnessof-fit criteria will be proposed in this paper for fuzzy clustering of multivariate time series. A general purpose fuzzy clustering algorithm may be used to estimate the proper cluster structure according to some internal criteria of cluster validity. Such indexes are known to measure actually definite often conflicting cluster properties, compactness or connectedness, for instance, or distribution, orientation, size and shape. It is argued that the multiobjective optimization supported by genetic algorithms is a most effective choice in such a di±cult context. In this paper we use the Xie-Beni index and the C-means functional as objective functions to evaluate the cluster validity in a multiobjective optimization framework. The concept of Pareto optimality in multiobjective genetic algorithms is used to evolve a set of potential solutions towards a set of optimal non-dominated solutions. Genetic algorithms are well suited for implementing di±cult optimization problems where objective functions do not usually have good mathematical properties such as continuity, differentiability or convexity. In addition the genetic algorithms, as population based methods, may yield a complete Pareto front at each step of the iterative evolutionary procedure. The method is illustrated by means of a set of real data and an artificial multivariate time series data set.Fuzzy clustering, Internal criteria of cluster validity, Genetic algorithms, Multiobjective optimization, Time series, Pareto optimality

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

    A Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection

    Get PDF
    Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decision-Making, Data Mining, and Multi-objective Optimization were employed. First, candidate stocks were clustered using two-step clustering method. Available stocks in each cluster were independently ranked using grey relational analysis. Firefly algorithm was employed for Pareto analysis of risk and ranking. The results of clustering in the stocks revealed that all candidate stocks were not placed in one cluster. The results of robustness analysis employed in ranking method verified the accuracy of calculations in the grey relational analysis through stock repetition of candidates in each cluster

    Multi-Class Clustering of Cancer Subtypes through SVM Based Ensemble of Pareto-Optimal Solutions for Gene Marker Identification

    Get PDF
    With the advancement of microarray technology, it is now possible to study the expression profiles of thousands of genes across different experimental conditions or tissue samples simultaneously. Microarray cancer datasets, organized as samples versus genes fashion, are being used for classification of tissue samples into benign and malignant or their subtypes. They are also useful for identifying potential gene markers for each cancer subtype, which helps in successful diagnosis of particular cancer types. In this article, we have presented an unsupervised cancer classification technique based on multiobjective genetic clustering of the tissue samples. In this regard, a real-coded encoding of the cluster centers is used and cluster compactness and separation are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of non-dominated solutions. A novel approach to combine the clustering information possessed by the non-dominated solutions through Support Vector Machine (SVM) classifier has been proposed. Final clustering is obtained by consensus among the clusterings yielded by different kernel functions. The performance of the proposed multiobjective clustering method has been compared with that of several other microarray clustering algorithms for three publicly available benchmark cancer datasets. Moreover, statistical significance tests have been conducted to establish the statistical superiority of the proposed clustering method. Furthermore, relevant gene markers have been identified using the clustering result produced by the proposed clustering method and demonstrated visually. Biological relationships among the gene markers are also studied based on gene ontology. The results obtained are found to be promising and can possibly have important impact in the area of unsupervised cancer classification as well as gene marker identification for multiple cancer subtypes

    A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

    Get PDF
    Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two contradictory objective functions based on maximum data compactness in clusters (the degree of proximity of data) and maximum cluster separation (the degree of remoteness of clusters’ centers) is proposed. In order to solve this model, a recently proposed optimization method, the Multi-objective Improved Teaching Learning Based Optimization (MOITLBO) algorithm, is used. This algorithm is tested on several datasets and its clusters are compared with the results of some single-objective algorithms. Furthermore, with respect to noise, the comparison of the performance of the proposed model with another multi-objective model shows that it is robust to noisy data sets and thus can be efficiently used for multi-objective fuzzy clustering

    Recent Development in Electricity Price Forecasting Based on Computational Intelligence Techniques in Deregulated Power Market

    Get PDF
    The development of artificial intelligence (AI) based techniques for electricity price forecasting (EPF) provides essential information to electricity market participants and managers because of its greater handling capability of complex input and output relationships. Therefore, this research investigates and analyzes the performance of different optimization methods in the training phase of artificial neural network (ANN) and adaptive neuro-fuzzy inference system (ANFIS) for the accuracy enhancement of EPF. In this work, a multi-objective optimization-based feature selection technique with the capability of eliminating non-linear and interacting features is implemented to create an efficient day-ahead price forecasting. In the beginning, the multi-objective binary backtracking search algorithm (MOBBSA)-based feature selection technique is used to examine various combinations of input variables to choose the suitable feature subsets, which minimizes, simultaneously, both the number of features and the estimation error. In the later phase, the selected features are transferred into the machine learning-based techniques to map the input variables to the output in order to forecast the electricity price. Furthermore, to increase the forecasting accuracy, a backtracking search algorithm (BSA) is applied as an efficient evolutionary search algorithm in the learning procedure of the ANFIS approach. The performance of the forecasting methods for the Queensland power market in the year 2018, which is well-known as the most competitive market in the world, is investigated and compared to show the superiority of the proposed methods over other selected methods.© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).fi=vertaisarvioitu|en=peerReviewed
    corecore