OLAP-Sequential Mining: Summarizing Trends from Historical Multidimensional Data using Closed Multidimensional Sequential Patterns

Abstract

International audienceData warehouses are now well recognized as the way to store historical data that will then be available for future queries and analysis. In this context, some challenges are still open, among which the problem of mining such data. OLAP mining, introduced by J. Han in 1997, aims at coupling data mining techniques and data warehousing. These techniques have to take the specificities of such data into account. One of the specificities that is often not addressed by classical methods for data mining is the fact that data warehouses describe data through several dimensions. Moreover, the data are stored through time, and we thus argue that sequential patterns are one of the best ways to summarize the trends from such databases. Sequential pattern mining aims at discovering correlations among events through time. However, the number of patterns can become very important when taking several analysis dimensions into account, as it is the case in the framework of multidimensional databases. This is why we propose here to define a condensed representation without loss of information: closed multidimensional sequential patterns. This representation introduces properties that allow to deeply prune the search space. In this paper, we also define algorithms that do not require candidate set maintenance. Experiments on synthetic and real data are reported and emphasize the interest of our proposal

    Similar works