31 research outputs found
Robust Methods for Soft Clustering of Multidimensional Time Series
Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7â8 October 2021.[Abstract] Three robust algorithms for clustering multidimensional time series from the perspective of underlying processes are proposed. The methods are robust extensions of a fuzzy C-means model based on estimates of the quantile cross-spectral density. Robustness to the presence of anomalous elements is achieved by using the so-called metric, noise and trimmed approaches. Analyses from a wide simulation study indicate that the algorithms are substantially effective in coping with the presence of outlying series, clearly outperforming alternative procedures. The usefulness of the suggested methods is also highlighted by means of a specific application.This research has been supported by MINECO (MTM2017-82724-R and PID2020-113578RB-100), the Xunta de Galicia (ED431C-2020-14), and âCITICâ (ED431G 2019/01).Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0
Clustering time series: an application to COVID-19 data
In this paper we present an attempt of clustering time series focusing on Italian data about COVID-19.
From the methodological point of view, we first present a review of the most important methods existing in literature for time series clustering. Similarly to cross-sectional clustering, time series clustering moves from the choice of an opportune algorithm to produce clusters. Several algorithms have been developed to carry out time series clustering and the choice of which one is more adapt depends on both the aim of the analysis itself and the typology of data at hand.
We apply some of these methods to the data set of daily time series on intensive care and deaths for COVID19 stretching from, respectively, 23/02/2020 to 15/02/2022 and from 23/02/2020 to 29/03/2022. These data refer to the 19 Italian regions and the two autonomous provinces of Trento and Bolzano
Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects
Time course microarray data provide insight about dynamic biological
processes. While several clustering methods have been proposed for the analysis
of these data structures, comparison and selection of appropriate clustering
methods are seldom discussed. We compared probabilistic based clustering
methods and distance based clustering methods for time course microarray
data. Among probabilistic methods, we considered: smoothing spline clustering
also known as model based functional data analysis (MFDA), functional
clustering models for sparsely sampled data (FCM) and model-based clustering
(MCLUST). Among distance based methods, we considered: weighted gene
co-expression network analysis (WGCNA), clustering with dynamic time warping
distance (DTW) and clustering with autocorrelation based distance (ACF). We
studied these algorithms in both simulated settings and case study data. Our
investigations showed that FCM performed very well when gene curves were short
and sparse. DTW and WGCNA performed well when gene curves were medium or long
( observations). SSC performed very well when there were clusters of gene
curves similar to one another. Overall, ACF performed poorly in these
applications. In terms of computation time, FCM, SSC and DTW were considerably
slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more
accurate and biological meaningful clustering results. WGCNA and MCLUST are the
best methods among the 6 methods compared, when performance and computation
time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides
model based inference and uncertainty measure of clustering results
Clustering stationary and non-stationary time series based on autocorrelation distance of hierarchical and k-means algorithms
Observing large dimension time series could be time-consuming. One identification and classification approach is a time series clustering. This study aimed to compare the accuracy of two algorithms, hierarchical cluster and K-Means cluster, using ACFâs distance for clustering stationary and non-stationary time series data. This research uses both simulation and real datasets. The simulation generates 7 stationary data models and another 7 of non-stationary data models. On the other hands, the real dataset is the daily temperature data in 34 cities in Indonesia. As a result, K-Means algorithm has the highest accuracy for both data models
Copula-based fuzzy clustering of spatial time series
This paper contributes to the existing literature on the analysis of spatial time series presenting a new clustering algorithm called COFUST, i.e. COpula-based FUzzy clustering algorithm for Spatial Time series. The underlying idea of this algorithm is to perform a fuzzy Partitioning Around Medoids (PAM) clustering using copula-based approach to interpret comovements of time series. This generalisation allows both to extend usual clustering methods for time series based on Pearsonâs correlation and to capture the uncertainty that arises assigning units to clusters. Furthermore, its flexibility permits to include directly in the algorithm the spatial information. Our approach is presented and discussed using both simulated and real data, highlighting its main advantages
Fuzzy clustering with entropy regularization for interval-valued data with an application to scientific journal citations
In recent years, the research of statistical methods to analyze complex structures of data has increased. In particular, a lot of attention has been focused on the interval-valued data. In a classical cluster analysis framework, an interesting line of research has focused on the clustering of interval-valued data based on fuzzy approaches. Following the partitioning around medoids fuzzy approach research line, a new fuzzy clustering model for interval-valued data is suggested. In particular, we propose a new model based on the use of the entropy as a regularization function in the fuzzy clustering criterion. The model uses a robust weighted dissimilarity measure to smooth noisy data and weigh the center and radius components of the interval-valued data, respectively. To show the good performances of the proposed clustering model, we provide a simulation study and an application to the clustering of scientific journals in research evaluation
Spatio-temporal clustering: Neighbourhoods based on median seasonal entropy
In this research, a new uncertainty clustering method has been developed and applied to the spatial time series with seasonality. The new unsupervised grouping method is based on Neighbourhoods and Median Seasonal Entropy. This classification method aims to discover similar behaviours for a time series group and find a dissimilarity measure concerning a reference series r. The Neighbourhoodâs Internal Verification Coefficient criterion makes it possible to measure intra-group similarity. This clustering criterion is flexible for spatial information. Our empirical approach allows us to measure accommodation decisions for tourists who visit Spain and decide to stay either in hotels or in tourist apartments. The results show the existence of dynamic seasonal patterns of behaviour. These insights support the decisions of economic agents.This research is associated with the group of Faculty of Economic and Business Sciences at the University of Malaga: âSocial Indicators-SEJ157â. The research group has funded the professional editing service in English. Research Funders: âFunding for open access charge: Universidad de MĂĄlaga/CBUAâ
Entropy-based fuzzy clustering of interval-valued time series
This paper proposes a fuzzy C-medoids-based clustering method with entropy regularization to solve the issue of grouping complex data as interval-valued time series. The dual nature of the data, that are both time-varying and interval-valued, needs to be considered and embedded into clustering techniques. In this work, a new dissimilarity measure,
based on Dynamic Time Warping, is proposed. The performance of the new clustering procedure is evaluated through a simulation study and an application to financial time series
Clustering networked funded European research activities through rank-size laws
This paper treats a well-established public evaluation problem, which is the analysis of the funded research projects. We specifically deal with the collection of the research actions funded by the European Union over the 7th Framework Programme for Research and Technological Development and Horizon 2020. The reference period is 2007â2020. The study is developed through three methodological steps. First, we consider the networked scientific institutions by stating a link between two organizations when they are partners in the same funded project. In doing so, we build yearly complex networks. We compute four nodal centrality measures with relevant, informative content for each of them. Second, we implement a rank-size procedure on each network and each centrality measure by testing four meaningful classes of parametric curves to fit the ranked data. At the end of such a step, we derive the best fit curve and the calibrated parameters. Third, we perform a clustering procedure based on the best-fit curves of the ranked data for identifying regularities and deviations among years of research and scientific institutions. The joint employment of the three methodological approaches allows a clear view of the research activity in Europe in recent years
Distribution-based entropy weighting clustering of skewed and heavy tailed time series
The goal of clustering is to identify common structures in a data set by forming groups of homogeneous objects. The observed characteristics of many economic time series motivated the development of classes of distributions that can accommodate properties, such as heavy tails and skewness. Thanks to its flexibility, the skewed exponential power distribution (also called skewed generalized error distribution) ensures a unified and general framework for clustering possibly skewed and heavy tailed time series. This paper develops a clustering procedure of model-based type, assuming that the time series are generated by the same underlying probability distribution but with different parameters. Moreover, we propose to optimally combine the estimated parameters to form the clusters with an entropy weighing k-means approach. The usefulness of the proposal is shown by means of application to financial time series, demonstrating also how the obtained clusters can be used to form portfolio of stocks.Peer ReviewedPostprint (published version