7 research outputs found

    The Influence of Global Constraints on Similarity Measures for Time-Series Databases

    Full text link
    A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents and important tool in many application areas, such as stock market analysis, process and quality control, observation of natural phenomena, medical treatments, etc. A vital component in many types of time-series analysis is the choice of an appropriate distance/similarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. Being of quadratic time complexity, however, global constraints are often employed to limit the search space in the matrix during the dynamic programming procedure, in order to speed up computation. Furthermore, it has been reported that such constrained measures can also achieve better accuracy. In this paper, we investigate two representative time-series distance/similarity measures based on dynamic programming, Dynamic Time Warping (DTW) and Longest Common Subsequence (LCS), and the effects of global constraints on them. Through extensive experiments on a large number of time-series data sets, we demonstrate how global constrains can significantly reduce the computation time of DTW and LCS. We also show that, if the constraint parameter is tight enough (less than 10-15% of time-series length), the constrained measure becomes significantly different from its unconstrained counterpart, in the sense of producing qualitatively different 1-nearest neighbor graphs. This observation explains the potential for accuracy gains when using constrained measures, highlighting the need for careful tuning of constraint parameters in order to achieve a good trade-off between speed and accuracy

    Uloga mera sličnosti u analizi vremenskih serija

    Get PDF
    The subject of this dissertation encompasses a comprehensive overview and analysis of the impact of Sakoe-Chiba global constraint on the most commonly used elastic similarity measures in the field of time-series data mining with a focus on classification accuracy. The choice of similarity measure is one of the most significant aspects of time-series analysis  -  it should correctly reflect the resemblance between the data presented in the form of time series. Similarity measures represent a critical component of many tasks of mining time series, including: classification, clustering, prediction, anomaly detection, and others. The research covered by this dissertation is oriented on several issues: 1.  review of the effects of  global constraints on the performance of computing similarity measures, 2.  a detailed analysis of the influence of constraining the elastic similarity measures on the accuracy of classical classification techniques, 3.  an extensive study of the impact of different weighting schemes on the classification of time series, 4.  development of an open source library that integrates the main techniques and methods required for analysis and mining time series, and which is used for the realization of these experimentsPredmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlјa kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uklјučujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detalјna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata.Predmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlja kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uključujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detaljna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata

    Optimizing Dynamic Time Warping’s Window Width for Time Series Data Mining Applications

    Get PDF
    Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that “the larger the better”, and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a naïve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW’s warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated “low hanging fruit” in optimizing DTW’s performance can produce improvements that make it an even stronger baseline, closing most or all the improvement gap of the more sophisticated methods proposed in recent years

    Data mining per serie storiche

    Get PDF

    High-resolution gridded climate dataset for data-scarce region

    Get PDF
    The knowledge of spatiotemporal distribution of climate variables is essential for most of hydro-climatic studies. However, scarcity or sparsity of long-term observations is one of the major obstacles for such studies. The main objective of this study is to develop a methodological framework for the generation of high-resolution gridded historical and future climate projection data for a data-scarce region. Egypt and its densely populated central north region (CNE) were considered as the study area. First, several existing gridded datasets were evaluated in reproducing the historical climate. The performances of five high-resolution satellite-based daily precipitation products were evaluated against gauges records using continuous and categorical metrics and selected intensity categories. In addition, two intelligent algorithms, symmetrical uncertainty (SU) and random forest (RF) are proposed for the evaluation of gridded monthly climate datasets. Second, a new framework is proposed to develop high-resolution daily maximum and minimum temperatures (Tmx and Tmn) datasets by using the robust kernel density distribution mapping method to correct the bias in interpolated observation estimates and WorldClim v.2 temperature climatology to adjust the spatial variability in temperature. Third, a new framework is proposed for the selection of Global Climate Models (GCMs) based on their ability to reproduce the spatial pattern for different climate variables. The Kling-Gupta efficiency (KGE) was used to assess GCMs in simulating the annual spatial patterns of Tmx, Tmn, and rainfall. The mean and standard deviation of KGEs were incorporated in a multi-criteria decision-making approach known as a global performance indicator for the ranking of GCMs. Fourth, several bias-correction methods were evaluated to identify the most suitable method for downscaling of the selected GCM simulations for the projection of high-resolution gridded climate data. The results revealed relatively better performance of GSMaP compared to other satellite-based rainfall products. The SU and RF were found as efficient methods for evaluating gridded monthly climate datasets and avoid the contradictory results often obtained by conventional statistics. Application of SU and RF revealed that GPCC rainfall and UDel temperature datasets as the best products for Egypt. The validation of the 0.05°×0.05° CNE datasets showed remarkable improvement in replicating the spatiotemporal variability in observed temperature. The new approached proposed for the selection of GCMs revealed that MRI-CGCM3 gives the best performance and followed by FGOALS-g2, GFDL-ESM2G, GFDL-CM3 and lastly MPI-ESM-MR over Egypt. The selected GCMs projected an increase in Tmx and Tmn in the range of 2.42 to 4.20°C and 2.34 to 4.43°C respectively for different scenarios by the end of the century. Winter temperature is projected to increase higher than summer temperature. For rainfall, a 62% reduction over the northern coastline is projected where rain is currently most abundant with an increase of rainfall over the dry southern zones. Linear and variance scaling methods were found suitable for developing bias-free high-resolution projections of rainfall and temperatures, respectively. As for the CNE, the high-resolution projections showed a rise in maximum (1.80 to 3.48°C) and minimum (1.88 to 3.49°C) temperature and change in rainfall depth (-96.04 to 36.51%) by the end of the century, which could have severe implications for this highly populated region

    Uloga mera sličnosti u analizi vremenskih serija

    Get PDF
    The subject of this dissertation encompasses a comprehensive overview and analysis of the impact of Sakoe-Chiba global constraint on the most commonly used elastic similarity measures in the field of time-series data mining with a focus on classification accuracy. The choice of similarity measure is one of the most significant aspects of time-series analysis  -  it should correctly reflect the resemblance between the data presented in the form of time series. Similarity measures represent a critical component of many tasks of mining time series, including: classification, clustering, prediction, anomaly detection, and others. The research covered by this dissertation is oriented on several issues: 1.  review of the effects of  global constraints on the performance of computing similarity measures, 2.  a detailed analysis of the influence of constraining the elastic similarity measures on the accuracy of classical classification techniques, 3.  an extensive study of the impact of different weighting schemes on the classification of time series, 4.  development of an open source library that integrates the main techniques and methods required for analysis and mining time series, and which is used for the realization of these experimentsPredmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlјa kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uklјučujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detalјna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata.Predmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlja kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uključujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detaljna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata

    Realidad aumentada bajo tecnología móvil basada en el contexto aplicada a destinos turísticos

    Get PDF
    En este trabajo se define un marco para la personalización de contenidos turísticos mediante sistemas de recomendación contextuales en un sistema de realidad aumentada. La realidad aumentada ofrece la posibilidad de mostrar información de forma intuitiva, rápida, interactiva y atractiva. Estas características hacen que su aplicación en sectores como el turismo, patrimonio, cultura y publicidad esté creciendo considerablemente. Uno de los principales inconvenientes de los sistemas de realidad aumentada actuales es que suelen mostrar un número de POIs excesivamente alto, por lo que es obligatorio aplicar técnicas que permitan mostrar solo la información que realmente le interesa al turista, es decir, información personalizada. Por tanto, se hace necesario la aplicación de los sistemas de recomendación en los sistemas de realidad amentada aplicados al turismo. Además, otro de los inconvenientes de los sistemas de recomendación turísticos actuales es que no utilizan información contextual de los turistas, y no genera recomendaciones para grupos de turistas que viajan juntos. El objetivo del trabajo es definir un soporte teórico para la creación y configuración de un sistema de recomendación para una herramienta de realidad aumentada de un destino turístico, donde los usuarios puedan disponer de herramientas para planificar individualmente o en grupo visitas o rutas turísticas, teniendo en cuenta sus preferencias y contexto. Cualquier destino consolidado puede tener miles de puntos, por lo que la cantidad de operaciones a realizar para obtener una recomendación es muy elevada. Resulta por tanto de gran utilidad disponer de mecanismos que permitan realizar un filtrado que reduzca significativamente el número de puntos que serán la entrada a los motores de recomendación. Se propone como solución el uso combinado de diferentes motores de recomendación. El objetivo final es asegurar que los elementos recomendados se adecuen lo mejor posible al contexto, gustos y preferencias del turista. El marco propuesto se fundamenta teóricamente en la utilización de ontologías para representar la información, utilización de análisis formal de conceptos, enfoque lingüístico difuso, teoría de probabilidad y cadenas de Markov. Además, para la visitas en grupos se propone la utilización de aspectos relacionados con la personalidad de los miembros del grupo para ponderar las recomendaciones. En el desarrollo de la tesis se han establecido tres grande bloques: En el primero de ellos se ha realizado un estudio de las técnicas y arquitecturas de sistemas de realidad aumentada, su aplicación al ámbito turístico, así como las ventajas e inconvenientes que acarrea su utilización en un destino. Asimismo, se ha realizado un estudio de las principales técnicas de recomendación, así como de diferentes tipos de estructuras para organizar la información de un destino turístico. En el segundo boque, se ha descrito con detalle el marco propuesto desarrollando diferentes modelos de filtrado y recomendación. Concretamente se exponen los siguientes motores de filtrado y recomendación contextual: - Modelo de Pre-filtrado contextual basado en implicaciones lógicas. - Modelo de recomendación basado en contenido. - Modelo de recomendación colaborativo. - Modelo de recomendación demográfico. - Modelo de recomendación histórico. - Modelo de recomendación grupal. - Generador de rutas dinámicas. En el tercer bloque se recogen las conclusiones derivadas del estudio realizado y de la propuesta presentada, así como las líneas futuras de investigación. También se añaden dos anexos: uno sobre la ontología desarrollada y otro sobre las características de un prototipo para el destino turístico Costa del Sol Occidental. La bibliografía empleada está basada fundamentalmente en artículos de revistas de impacto y artículos de congresos relacionados con los siguientes temas: sistemas de recomendación, sistemas de realidad aumentada, representación de la información mediante ontologías, lógica difusa, análisis formal de conceptos, etc. También se han utilizado libros y estudios de aplicación de las tecnologías al ámbito turístico
    corecore