5 research outputs found

    Similarity Search of Time Series with Moving Average Based Indexing

    Get PDF
    提出了基于移动均值的索引来解决子序列匹配中的"(-查询"问题;提出并证明了基于移动均值的缩距定理和缩距比关系定理,后者具有很好的"裁减"能力,可以在相似查询时淘汰大部分不符合条件的候选时间序列,从而达到快速相似查找的目的;引入了由Jagadish 等人提出的BATON*-树,并在此基础上适当修改,建立了MABI索引,极大地加快了相似查询过程;最后,在一个股票交易数据集上进行了实验,证明了MABI索引的良好性能.In this paper, a method called MABI (moving average based indexing) is proposed to effectively deal with the issue of (-search query in subsequence matching. Two important theorems, distance reduction theorem and DRR(distance reduction rate) relation theorem, are proposed here to be as the basis of MABI. DRR relation theorem has strong capability in "pruning" those unqualified candidate sequences so as to achieve of fast similarity search. Furthermore, by modifying BATON* introduced by Jagadish, et al., a multi-way balanced tree structure is introduced, to construct the index from time series, which significantly speeds up the similarity search. Extensive experiments over a stock exchange dataset show that MABI can achieve desirable performance.Supported by the National Natural Science Foundation of China under Grant No.60473051(国家自然科学基金); The National High-Tech Research and Development Plan of China under Grand Nos.2007AA01Z191, 2006AA01Z230 (国家高技术研究与发展计划(863)

    Techniques to explore time-related correlation in large datasets

    Get PDF
    The next generation of database management and computing systems will be significantly complex with data distributed both in functionality and operation. The complexity arises, at least in part, due to data types involved and types of information request rendered by the database user. Time sequence databases are generated in many practical applications. Detecting similar sequences and subsequences within these databases is an important research area and has generated lot of interest recently. Previous studies in this area have concentrated on calculating similitude between (sub)sequences of equal sizes. The question of unequal sized (sub)sequence comparison to report similitude has been an open problem for some time. The problem is an important and non-trivial one. In this dissertation, we propose a solution to the problem of finding sequences, in a database of unequal sized sequences, that are similar to a given query sequence. A paradigm to search pairs of similar, equal and unequal sized, subsequences within a pair of sequences is also presented. We put forward new approaches for sequence time-scale reduction, feature aggregation and object recognition. To make the search of similar sequences efficient, we propose an indexing technique to index the unequal-sized sequence database. We also introduce a unique indexing technique to index identified subsequences within a reference sequence. This index is subsequently employed to report similar pairs of subsequences, when presented with a query sequence. We present several experimental results and also compare the proposed framework with previous work in this area

    Data-mining massive time series astronomical data sets — A case study

    No full text
    corecore