Search CORE

40 research outputs found

Quantile-Based Fuzzy Clustering of Multivariate Time Series in the Frequency Domain

Author: D'Urso Pierpaolo
López-Oriona Ángel
Vilar José
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] A novel procedure to perform fuzzy clustering of multivariate time series generated from different dependence models is proposed. Different amounts of dissimilarity between the generating models or changes on the dynamic behaviours over time are some arguments justifying a fuzzy approach, where each series is associated to all the clusters with specific membership levels. Our procedure considers quantile-based cross-spectral features and consists of three stages: (i) each element is characterized by a vector of proper estimates of the quantile cross-spectral densities, (ii) principal component analysis is carried out to capture the main differences reducing the effects of the noise, and (iii) the squared Euclidean distance between the first retained principal components is used to perform clustering through the standard fuzzy C-means and fuzzy C-medoids algorithms. The performance of the proposed approach is evaluated in a broad simulation study where several types of generating processes are considered, including linear, nonlinear and dynamic conditional correlation models. Assessment is done in two different ways: by directly measuring the quality of the resulting fuzzy partition and by taking into account the ability of the technique to determine the overlapping nature of series located equidistant from well-defined clusters. The procedure is compared with the few alternatives suggested in the literature, substantially outperforming all of them whatever the underlying process and the evaluation scheme. Two specific applications involving air quality and financial databases illustrate the usefulness of our approach.The authors are grateful to the anonymous referees for their comments and suggestions. The research of Ángel López-Oriona and José A. Vilar has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUGXunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

Repositorio da Universidade da Coruña

Archivio della ricerca- Università di Roma La Sapienza

New methodological contributions in time series clustering

Author: Lafuente Rego Borja Raúl
Publication venue
Publication date: 01/01/2017
Field of study

Programa Oficial de Doutoramento en Estatística e Investigación Operativa. 555V01[Abstract] This thesis presents new procedures to address the analysis cluster of time series. First of all a two-stage procedure based on comparing frequencies and magnitudes of the absolute maxima of the spectral densities is proposed. Assuming that the clustering purpose is to group series according to the underlying dependence structures, a detailed study of the behavior in clustering of a dissimilarity based on comparing estimated quantile autocovariance functions (QAF) is also carried out. A prediction-based resampling algorithm proposed by Dudoit and Fridlyand is adjusted to select the optimal number of clusters. The asymptotic behavior of the sample quantile autocovariances is studied and an algorithm to determine optimal combinations of lags and pairs of quantile levels to perform clustering is introduced. The proposed metric is used to perform hard and soft partitioning-based clustering. First, a broad simulation study examines the behavior of the proposed metric in crisp clustering using hierarchkal and PAM procedure. Then, a novel fuzzy C-mcdoids algorithm based on the QAF-dissimilarity is proposed. Three different robust versions of this fuzzy algorithm are also presented to deal with data containing outlier time series. Finally, other ways of soft clustering analysis are explored, namely probabilistic 0-clustering and clustering based on mixture models.[Resumo] Esta tese presenta novos procedementos para abordar a análise cluster de series temporais. En primeiro lugar proponse un procedemento en dúas etapas baseádo na comparación de frecuencias e magnitudes dos máximos absolutos das densidades espectrais. Supoñendo que o propósito é agrupar series dacordo coas estruturas de dependencia subxaccntes, tamén se leva a cabo un estudo detallado do comportamento en clustering dunha disimilaridade basea.da na comparación das funcións estimadas das autocovarianzas cuantil (QAF). Un algoritmo de remostraxe baseado na predición proposto por Dudoit e Fridlyand adáptase para selecionar o número óptimo de clusters. Tamén se estuda o comportamento asintótico das autocovarianzas cuantís e se introduce un algoritmo para determinar as combinacións óptimas de lags e pares de niveles de cuantís para levar a cabo a clasificación. A métrica proposta utilízase para realizar análise cluster baseado en particións "hard" e "soft". En primeiro lugar, un amplo estudo de simulación examina o comportamento da métrica proposta en clústering "hard" utilizando os procedementos xerárquico e PAM. A continuación, proponse un novo algoritmo "fuzzy" C-medoides baseado na disimilaridade QAF. Tamén se presentan tres versións robustas deste algoritmo "fuzzy" para tratar con datos que conteñan valores atípicos. Finalmente, explóranse outras vías de análise cluster "soft", concretamente, D-clustering probabilístico e clustering baseado en modelos mixtos.[Resumen] Esta tesis presenta nuevos procedimientos para abordar el análisis cluster de series temporales. En primer lugar se propone un procedimiento en dos etapas basado en la comparación de frecuencias y magnitudes de los máximos absolutos de las densidades espectrales. Suponiendo que el propósito es agrupar series de acuerdo con las estructuras de dependencia subyacentes, también se lleva. a cabo un estudio detallado del comportamiento en clustering de una disimilaridad basada en la comparación de las funciones estimadas de las autoco,'afiancias cuantil (QAF). Un algoritmo de remuestreo basado en predicción propuesto por Dudoit y Fridlyand se adapta para seleccionar el número óptimo de clusters. También se estudia el comportamiento asintótico de las autocovariancias cuantites y se introduce un algoritmo para determinar las combinaciones óptimas de lags y pares de niveles de cuantiles para llevar a cabo la clasificación. La. métrica propuesta se utiliza para realizar análisis cluster basado en particiones "hard" y ''soft". En primer lugar, un amplio elltudio de simulación examina el comportamiento de la métrica propuesta en clúster "hard" utilizando los procedimientos jerárquico y PAM. A continuación, se propone un nuevo algoritmo "fuzzy" Cmedoides basado en la disimilaridad QAF. También se presentan tres versiones robustas de este algoritmo "fuzzy" para tratar con datos que contengan atípicos. Finalmente, se exploran otras vías de análisis clus ter "soft", concretamente, D-clustering probabilístico y clustering basado en modelos mixtos

Repositorio da Universidade da Coruña

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Hierarchical clustering for smart meter electricity loads based on quantile autocovariances

Author: Alonso Fernández Andrés Modesto
Nogales Martín Francisco Javier
Ruiz Mora Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2020
Field of study

In order to improve the efficiency and sustainability of electricity systems, most countries worldwide are deploying advanced metering infrastructures, and in particular household smart meters, in the residential sector. This technology is able to record electricity load time series at a very high frequency rates, information that can be exploited to develop new clustering models to group individual households by similar consumptions patterns. To this end, in this work we propose three hierarchical clustering methodologies that allow capturing different characteristics of the time series. These are based on a set of “dissimilarity” measures computed over different features: quantile auto-covariances, and simple and partial autocorrelations. The main advantage is that they allow summarizing each time series in a few representative features so that they are computationally efficient, robust against outliers, easy to automatize, and scalable to hundreds of thousands of smart meters series. We evaluate the performance of each clustering model in a real-world smart meter dataset with thousands of half-hourly time series. The results show how the obtained clusters identify relevant consumption behaviors of households and capture part of their geo-demographic segmentation. Moreover, we apply a supervised classification procedure to explore which features are more relevant to define each cluster.This work was supported in part by the Spanish Government through Project under Grant MTM2017-88979-P, and in part by the Fundación Iberdrola through “Ayudas a la Investigación en Energía y Medio Ambiente 2018.” The work of Andrés M. Alonso was supported in part by the Spanish Government through Project under Grant ECO2015-66593-P. Paper no. TSG-01702-2019

Universidad Carlos III de Madrid e-Archivo

Quantile Cross-Spectral Density: A Novel and Effective Tool for Clustering Multivariate Time Series

Author: López-Oriona Ángel
Vilar José
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Clustering of multivariate time series is a central problem in data mining with applications in many fields. Frequently, the clustering target is to identify groups of series generated by the same multivariate stochastic process. Most of the approaches to address this problem include a prior step of dimensionality reduction which may result in a loss of information or consider dissimilarity measures based on correlations and cross-correlations but ignoring the serial dependence structure. We propose a novel approach to measure dissimilarity between multivariate time series aimed at jointly capturing both cross dependence and serial dependence. Specifically, each series is characterized by a set of matrices of estimated quantile cross-spectral densities, where each matrix corresponds to a pair of quantile levels. Then the dissimilarity between every couple of series is evaluated by comparing their estimated quantile cross-spectral densities, and the pairwise dissimilarity matrix is taken as starting point to develop a partitioning around medoids algorithm. Since the quantile-based cross-spectra capture dependence in quantiles of the joint distribution, the proposed metric has a high capability to discriminate between high-level dependence structures. An extensive simulation study shows that our clustering procedure outperforms a wide range of alternative methods and exhibits robustness to noise distribution besides being computationally efficient. A real data application involving bivariate financial time series illustrates the usefulness of the proposed approach. The procedure is also applied to cluster nonstationary series from the UEA multivariate time series classification archive.This research has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUGXunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

Repositorio da Universidade da Coruña

Fuzzy clustering of ordinal time series based on two novel distances with economic applications

Author: Oriona Ángel López
Vilar José Antonio
Weiss Christian
Publication venue
Publication date: 24/04/2023
Field of study

Time series clustering is a central machine learning task with applications in many fields. While the majority of the methods focus on real-valued time series, very few works consider series with discrete response. In this paper, the problem of clustering ordinal time series is addressed. To this aim, two novel distances between ordinal time series are introduced and used to construct fuzzy clustering procedures. Both metrics are functions of the estimated cumulative probabilities, thus automatically taking advantage of the ordering inherent to the series' range. The resulting clustering algorithms are computationally efficient and able to group series generated from similar stochastic processes, reaching accurate results even though the series come from a wide variety of models. Since the dynamic of the series may vary over the time, we adopt a fuzzy approach, thus enabling the procedures to locate each series into several clusters with different membership degrees. An extensive simulation study shows that the proposed methods outperform several alternative procedures. Weighted versions of the clustering algorithms are also presented and their advantages with respect to the original methods are discussed. Two specific applications involving economic time series illustrate the usefulness of the proposed approaches

arXiv.org e-Print Archive

Copula-based fuzzy clustering of spatial time series

Author: Alonso
Athanasopoulos
Basford
Birant
Bárdossy
Caiado
Caiado
Caiado
Caiado
Campello
Coppi
Coppi
Coppi
Coppi
De Luca
De Luca
Di Lascio
Di Lascio
Durante
Durante
Durante
Durante
Durante
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
D’Urso
Ester
Everitt
Fouedjio
Garcia-Escudero
Genest
Grabisch
Guthke
Handl
Hu
Hubert
Hwang
Hyndman
Hüllermeier
Ienco
Izakian
James
Joe
Kamdar
Kaufman
Kazianka
Klement
Krishnapuram
Lafuente-Rego
Maharaj
Maharaj
Maharaj
Montes
Nelsen
Otranto
Patton
Piccolo
Rand
Shekhar
Torabi
Torabi
Vilar
Viroli
Wang
Wang
Warren Liao
Wedel
Xie
Xie
Yager
Yang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

This paper contributes to the existing literature on the analysis of spatial time series presenting a new clustering algorithm called COFUST, i.e. COpula-based FUzzy clustering algorithm for Spatial Time series. The underlying idea of this algorithm is to perform a fuzzy Partitioning Around Medoids (PAM) clustering using copula-based approach to interpret comovements of time series. This generalisation allows both to extend usual clustering methods for time series based on Pearson’s correlation and to capture the uncertainty that arises assigning units to clusters. Furthermore, its flexibility permits to include directly in the algorithm the spatial information. Our approach is presented and discussed using both simulated and real data, highlighting its main advantages

Crossref

Bournemouth University Research Online

Archivio della ricerca- Università di Roma La Sapienza

Archivio Istituzionale della Ricerca- Università del Salento

Archivio istituzionale della ricerca - Università di Padova

The Bootstrap for Testing the Equality of Two Multivariate Stochastic Processes with an Application to Financial Markets

Author: López-Oriona Ángel
Vilar José
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

[Abstract] The problem of testing the equality of generating processes of two multivariate time series is addressed in this work. To this end, we construct two tests based on a distance measure between stochastic processes. The metric is defined in terms of the quantile cross-spectral densities of both processes. A proper estimate of this dissimilarity is the cornerstone of the proposed tests. Both techniques are based on the bootstrap. Specifically, extensions of the moving block bootstrap and the stationary bootstrap are used for their construction. The approaches are assessed in a broad range of scenarios under the null and the alternative hypotheses. The results from the analyses show that the procedure based on the stationary bootstrap exhibits the best overall performance in terms of both size and power. The proposed techniques are used to answer the question regarding whether or not the dotcom bubble crash of the 2000s permanently impacted global market behavior.This research has been supported by MINECO (MTM2017-82724-R and PID2020-113578RB-100), the Xunta de Galicia (ED431C-2020-14), and “CITIC” (ED431G 2019/01)Xunta de Galicia; ED431C-2020-14Xunta de Galicia; ED431G 2019/0

Repositorio da Universidade da Coruña

Spatio-temporal clustering: Neighbourhoods based on median seasonal entropy

Author: Ruiz Reina Miguel Ángel
Publication venue: 'Elsevier BV'
Publication date: 24/08/2021
Field of study

In this research, a new uncertainty clustering method has been developed and applied to the spatial time series with seasonality. The new unsupervised grouping method is based on Neighbourhoods and Median Seasonal Entropy. This classification method aims to discover similar behaviours for a time series group and find a dissimilarity measure concerning a reference series r. The Neighbourhood’s Internal Verification Coefficient criterion makes it possible to measure intra-group similarity. This clustering criterion is flexible for spatial information. Our empirical approach allows us to measure accommodation decisions for tourists who visit Spain and decide to stay either in hotels or in tourist apartments. The results show the existence of dynamic seasonal patterns of behaviour. These insights support the decisions of economic agents.This research is associated with the group of Faculty of Economic and Business Sciences at the University of Malaga: “Social Indicators-SEJ157”. The research group has funded the professional editing service in English. Research Funders: “Funding for open access charge: Universidad de Málaga/CBUA”

Repositorio Institucional Universidad de Málaga

Similarity Approaches for High-Dimensional Financial Time Series - with an Application to Pairs Trading

Author: El-Oraby Karem
Publication venue
Publication date: 01/01/2019
Field of study

Open Access LMU

Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences

Author: D'Urso Pierpaolo
Lopez-Oriona Ángel
Vilar José
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG.[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's κ. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14The research of Ángel López-Oriona and José A. Vilar has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ángel López-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory

Repositorio da Universidade da Coruña