12 research outputs found

    Similarity Search of Time Series with Moving Average Based Indexing

    Get PDF
    提出了基于移动均值的索引来解决子序列匹配中的"(-查询"问题;提出并证明了基于移动均值的缩距定理和缩距比关系定理,后者具有很好的"裁减"能力,可以在相似查询时淘汰大部分不符合条件的候选时间序列,从而达到快速相似查找的目的;引入了由Jagadish 等人提出的BATON*-树,并在此基础上适当修改,建立了MABI索引,极大地加快了相似查询过程;最后,在一个股票交易数据集上进行了实验,证明了MABI索引的良好性能.In this paper, a method called MABI (moving average based indexing) is proposed to effectively deal with the issue of (-search query in subsequence matching. Two important theorems, distance reduction theorem and DRR(distance reduction rate) relation theorem, are proposed here to be as the basis of MABI. DRR relation theorem has strong capability in "pruning" those unqualified candidate sequences so as to achieve of fast similarity search. Furthermore, by modifying BATON* introduced by Jagadish, et al., a multi-way balanced tree structure is introduced, to construct the index from time series, which significantly speeds up the similarity search. Extensive experiments over a stock exchange dataset show that MABI can achieve desirable performance.Supported by the National Natural Science Foundation of China under Grant No.60473051(国家自然科学基金); The National High-Tech Research and Development Plan of China under Grand Nos.2007AA01Z191, 2006AA01Z230 (国家高技术研究与发展计划(863)

    Research on Materialized View Selection

    Get PDF
    定义了数据仓库领域的视图选择问题,并讨论了与该问题相关的代价模型、收益函数、代价计算、约束条件和视图索引等内容;介绍了3大类视图选择方法,即静态方法、动态方法和混合方法,以及各类方法的代表性研究成果;最后展望未来的研究方向.Definition of view selection issue in the field of data warehouses is presented, followed by the discussion of related problems, such as cost model, benefit function, cost computation, restriction condition, view index, etc. Then three categories of view selection methods, namely, static, dynamic and hybrid methods are discussed. For each method, some representative work is introduced. Finally some future trends in this area are discussed.Supported by the National Natural Science Foundation of China under Grant No.60473051 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant Nos.2007AA01Z191, 2006AA01Z230 (国家高技术研究发展计划(863)

    Change Data Capture in Real-Time Active Data Warehouses: A Survey

    Get PDF
    本文是在北京大学数据库实验室攻读博士学位期间发表的。实时主动数据仓库是数据仓库的最新发展阶段和未来发展趋势,它为企业提供了对战略决策和战术决策的双重支持.实时主动数据仓库中包含两类数据,即实时数据和非实时数据,相应地,需要两种不同类型的变化数据捕捉方法,即支持实时变化数据捕捉的方法和普通的(不支持实时的)变化数据捕捉方法.结合在该领域的研究经验,对实时主动数据仓库中可以使用的多种变化数据捕捉方法进行了系统地论述,并比较各种方法的应用条件、优点、缺点和适用场合。Real-time active data warehouse is the most recent stage in the evolution history of data warehouses.It supports both strategic decision and tactic decision,which will bring great benefits to organizations.There are two types of data existing in real-time active data warehouses,i.e.,real-time data and non-real-time data.Accordingly,change data capture methods are classified into tWO kinds,including those supporting real-time change data capture and those not supporting real-time change data capture.Based on extensive research work in this field,those change data capture methods are systematically discussed,which may meet the requirements in real-time active data warehouses.国家自然科学基金项目(60473015);国家“863”高技术研究发展计划基金项目(2006AAl2Z217);HP中国实验室联合项

    MBA: A market-based approach to data allocation and dynamic migration for cloud database

    Get PDF
    With the coming shift to cloud computing, cloud database is emerging to provide database service over the Internet. In the cloud-based environment, data are distributed at Internet scale and the system needs to handle a huge number of user queries simultaneously without delay. How data are distributed among the servers has a crucial impact on the query load distribution and the system response time. In this paper, we propose a market-based control method, called MBA, to achieve query load balance via reasonable data distribution. In MBA, database nodes are treated as traders in a market, and certain market rules are used to intelligently decide data allocation and migration. We built a prototype system and conducted extensive experiments. Experimental results show that the MBA method signicantly improves system performance in terms of average query response time and fairness

    Research on Requirement-based Real-time Data Integration in Real-time Active Data Warehouses

    Get PDF
    本文是在北京大学数据库实验室攻读博士学位期间发表的。实时数据集成是实时主动数据仓库研究领域的一个重要问题。现有的研究成果都是从技术角度出发,而并没有考虑具体的商务应用需求。而在大型商务应用中,即使采用过滤规则只捕捉感兴趣的变化数据,也会产生大量的数据集成工作,从而导致不必要的沉重系统开销,同时还很有可能出现系统响应缓慢和用户需求无法得到满足等情况。本文从应用角度出发,提出了实时主动数据仓库中面向需求的实时数据集成方法,包括被频繁请求的数据的实时集成、满足突发请求的实时数据集成和由用户决定的实时数据集成。针对不同的商务需求,采用不同的数据集成策略,可以很好地满足不同类型的应用需求。Real-time data integration is a very important aspect in the field of real-time active data warehouse. Almost all the available research work now is from a technological point of view instead of an application angle. While in the real-world business application, a large amount of real-time data integration needs to be done even with the help of change data capture technology to integrate only the interesting part of the data from the data source, which will usually lead to the deteriorated system performance and fail to satisfy the business requirement in some cases. From an application angle, we here propose three requirement-based real-time data integration methods, including: real-time integration for the frequently requested data, real-time integration for the suddenly arising requirement and user-decided real-time integration. By adopting the appropriate method for the specific application occasion, we can better satisfy the various business requirements.国家自然科学基金项目(60473015);国家“863”高技术研究发展计划基金项目(2006AAl2Z217);HP中国实验室联合项

    Keyword Search over Relational Databases

    Get PDF
    介绍了基于关系数据库的关键词查询问题的研究背景;阐述了解决该问题的两大类方法,即基于数据图的方法和基于模式图的方法,并详细介绍了各种方法的原理以及各自的优缺点;最后展望了未来的研究方向。First, the research background of keyword search over relational databases is presented and is followed by a detailed description of two solutions to this problem, i.e., data graph based and schema graph based methods, and a discussion of the principles, advantages and disadvantages of these methods is also mentioned. Finally, some future trends in this area are discussed.Supported by the National Natural Science Foundation of China under Grant No.50604012 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2009AA01Z150 (国家高技术研究发展计划(863)

    Materialized Views Selection of Multi-Dimensional Data in Real-Time Active Data Warehouses

    Get PDF
    通过基于主动决策引擎日志的数据挖掘来找到分析规则的CUBE 使用模式,从而为多维数据实视图选择算法提供重要依据;在此基础上设计了3A 概率模型,并给出考虑CUBE 受访概率分布的视图选择贪婪算法PGreedy(probability greedy), 以及结合视图挽留原则的视图动态调整算法. 实验结果表明, 在实时主动数据仓库环境下,PGreedy 算法比BPUS(benefit per unit space)算法具有更好的性能. In this paper, data mining based on the log of active decision engine is introduced to find the CUBE using pattern of analysis rules, which can be used as important reference information for materialized views selection. Based on it, a 3A probability model is designed, and the greedy algorithm, called PGreedy (probability greedy), is proposed, which takes into account the probability distribution of CUBE. Also view keeping rule is adopted to achieve better performance for dynamic view adjusting. Experimental results show that PGreedy algorithm can achieve better performance than BPUS (benefit per unit space) algorithm in real-time active data warehouses environment.Supported by the National Natural Science Foundation of China under Grant No.60473051 (国家自然科学基金); the China HP Co. and Peking University Joint Project (北京大学-惠普(中国)合作项目

    Dealing with Query Contention Issue in Real-time Data Warehouses by Dynamic Multi-level Caches

    Get PDF
    The issue of query contention and scalability is the most difcult issue facing organizations deploying real-time data warehouse s olutions. The contention between complex se-lects and continuous inserts tends to severely limit the scal-ability of the data warehouses. I n this paper, we present a new method called dynamic multi-level caches, to effec-tively deal with the problem of query contention and scal-ability in real-time data warehouses. We differentiate be-tween queries with various data freshness requirements, and use multi-level caches to satisfy these different require-ments. Every query arriving at the system will be automat-ically redirected to the corresponding cache to access the required data, which means that the query loads are dis-tributed to multi-level caches instead of becoming blocked in the only one cache due to the contention between query and update operations. Extensive experiments on s everal real datasets s how that our method can effectively balance the query loads among multi-level caches and achieve desirable system performance

    User-oriented Materialized View Selection

    Get PDF
    The problem of materialized view selection has been long researched, and many approaches have been proposed to deal with this issue. However, all the methods proposed to date strive toward improving the overall query performance, instead of being user-oriented. In this paper, we propose a new user-oriented method, called SOMES (uSerOriented Materialized viEw Selection), aiming at achieving better performance for view selection problem. SOMES takes into account query characteristics of different users, in which, users are classified into different groups according to their query characteristics, and various user groups are provided with their own windows, user view windows containing the views involved in their own query process. Experimental results show that our method can achieve desirable performance improvements over other methods such as BPUS and FPUS
    corecore