3 research outputs found

    Tropical Cyclone Event Sequence Similarity Search via Dimensionality Reduction and Metric Learning

    No full text
    The Earth Observing System Data and Information System (EOSDIS) is a comprehensive data and information system which archives, manages, and distributes Earth science data from the EOS spacecrafts. One non-existent capability in the EOSDIS is the retrieval of satellite sensor data based on weather events (such as tropical cyclones) similarity query output. In this paper, we propose a framework to solve the similarity search problem given user-defined instance-level constraints for tropical cyclone events, represented by arbitrary length multidimensional spatio-temporal data sequences. A critical component for such a problem is the similarity/metric function to compare the data sequences. We describe a novel Longest Common Subsequence (LCSS) parameter learning approach driven by nonlinear dimensionality reduction and distance metric learning. Intuitively, arbitrary length multidimensional data sequences are projected into a fixed dimensional manifold for LCSS parameter learning. Similarity search is achieved through consensus among the (similar) instance-level constraints based on ranking orders computed using the LCSS-based similarity measure. Experimental results using a combination of synthetic and real tropical cyclone event data sequences are presented to demonstrate the feasibility of our parameter learning approach and its robustness to variability in the instance constraints. We, then, use a similarity query example on real tropical cyclone event data sequences from 2000 to 2008 to discuss (i) a problem of scientific interest, and (ii) challenges and issues related to the weather event similarity search problem

    Tropical Cyclone Event Sequence Similarity Search via Dimensionality Reduction and Metric Learning

    No full text
    The Earth Observing System Data and Information System (EOSDIS) is a comprehensive data and information system which archives, manages, and distributes Earth science data from the EOS spacecrafts. One non-existent capability in the EOSDIS is the retrieval of satellite sensor data based on weather events (such as tropical cyclones) similarity query output. In this paper, we propose a framework to solve the similarity search problem given user-defined instance-level constraints for tropical cyclone events, represented by arbitrary length multidimensional spatio-temporal data sequences. A critical component for such a problem is the similarity/metric function to compare the data sequences. We describe a novel Longest Common Subsequence (LCSS) parameter learning approach driven by nonlinear dimensionality reduction and distance metric learning. Intuitively, arbitrary length multidimensional data sequences are projected into a fixed dimensional manifold for LCSS parameter learning. Similarity search is achieved through consensus among the (similar) instance-level constraints based on ranking orders computed using the LCSS-based similarity measure. Experimental results using a combination of synthetic and real tropical cyclone event data sequences are presented to demonstrate the feasibility of our parameter learning approach and its robustness to variability in the instance constraints. We, then, use a similarity query example on real tropical cyclone event data sequences from 2000 to 2008 to discuss (i) a problem of scientific interest, and (ii) challenges and issues related to the weather event similarity search problem

    Uloga mera sličnosti u analizi vremenskih serija

    Get PDF
    The subject of this dissertation encompasses a comprehensive overview and analysis of the impact of Sakoe-Chiba global constraint on the most commonly used elastic similarity measures in the field of time-series data mining with a focus on classification accuracy. The choice of similarity measure is one of the most significant aspects of time-series analysis  -  it should correctly reflect the resemblance between the data presented in the form of time series. Similarity measures represent a critical component of many tasks of mining time series, including: classification, clustering, prediction, anomaly detection, and others. The research covered by this dissertation is oriented on several issues: 1.  review of the effects of  global constraints on the performance of computing similarity measures, 2.  a detailed analysis of the influence of constraining the elastic similarity measures on the accuracy of classical classification techniques, 3.  an extensive study of the impact of different weighting schemes on the classification of time series, 4.  development of an open source library that integrates the main techniques and methods required for analysis and mining time series, and which is used for the realization of these experimentsPredmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlјa kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uklјučujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detalјna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata.Predmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlja kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uključujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detaljna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata
    corecore