30 research outputs found
OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data
Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting information under different perspectives and levels of granularity. Nevertheless, OLAP techniques do not allow the identification of relationships, groupings, or exceptions that could hold in a data cube. To that end, we propose to enrich OLAP techniques with data mining facilities to benefit from the capabilities they offer. In this chapter, we propose an online environment for mining association rules in data cubes. Our environment called OLEMAR (online environment for mining association rules), is designed to extract associations from multidimensional data. It allows the extraction of inter-dimensional association rules from data cubes according to a sum-based aggregate measure, a more general indicator than aggregate values provided by the traditional COUNT measure. In our approach, OLAP users are able to drive a mining process guided by a meta-rule, which meets their analysis objectives. In addition, the environment is based on a formalization, which exploits aggregate measures to revisit the definition of the support and the confidence of discovered rules. This formalization also helps evaluate the interestingness of association rules according to two additional quality measures: lift and loevinger. Furthermore, in order to focus on the discovered associations and validate them, we provide a visual representation based on the graphic semiology principles. Such a representation consists in a graphic encoding of frequent patterns and association rules in the same multidimensional space as the one associated with the mined data cube. We have developed our approach as a component in a general online analysis platform called Miningcubes according to an Apriori-like algorithm, which helps extract inter-dimensional association rules directly from materialized multidimensional structures of data. In order to illustrate the effectiveness and the efficiency of our proposal, we analyze a real-life case study about breast cancer data and conduct performance experimentation of the mining process
Une approche connexionniste pour l'extension de l'OLAP à des capacités de prédiction
National audienceLes outils de l'analyse en ligne (OLAP) permettent à l'utilisateur de réaliser des tâches exploratoires dans les cubes de données. Cependant, ils n'offrent aucun moyen pour la prédiction ou l'explication des faits. En vue de renforcer le processus de l'aide à la décision, plusieurs travaux ont proposé l'extension de l'analyse en ligne à des capacités plus avancées. Dans cet article, nous proposons une nouvelle approche d'extension de l'analyse en ligne à des capacités de prédiction à deux phases. La première est une phase de réduction des dimensions des cubes de données, qui repose sur l'analyse en composantes principales (ACP). La deuxième est une phase de prédiction dans laquelle nous introduisons une nouvelle architecture de percéptrons multicouches (PMC). Notre étude expérimentale a montré une capacité de prédiction prometteuse, ainsi qu'une bonne robustesse dans le cas d'un cube épar
A Data Mining-Based OLAP Aggregation of Complex Data: Application on XML Documents
International audienceNowadays, most organizations deal with complex data having different formats and coming from different sources. The XML formalism is evolving and becoming a promising solution for modelling and warehousing these data in decision support systems. Nevertheless, classical OLAP tools are still not capable to analyze such data. In this paper, we associate OLAP and data mining to cope advanced analysis on complex data. We provide a generalized OLAP operator, called OpAC, based on the AHC. OpAC is adapted for all types of data since it deals with data cubes modelled within XML. Our operator enables significant aggregates of facts expressing semantic similarities. Evaluation criteria of aggregates' partitions are proposed in order to assist the choice of the best partition. Furthermore, we developed a Web application for our operator. We also provide performance experiments and drive a case study on XML documents dealing with the breast cancer researches domain
A Multiple Correspondence Analysis to Organize Data Cubes
International audienceOn Line Analytical Processing (OLAP) is a technology basically created to provide users with tools in order to explore and navigate into data cubes. Unfortunately, in huge and sparse data, exploration becomes a tedious task and the simple user's intuition or experience does not lead to efficient results. In this paper, we propose to exploit the results of the Multiple Correspondence Analysis (MCA) in order to enhance data cube representations and make them more suitable for visualization and thus, easier to analyze. Our approach addresses the issues of organizing data in an interesting way and detects relevant facts. Our purpose is to help the interpretation of multidimensional data by efficient and simple visual effects. To validate our approach, we compute its efficiency by measuring the quality of resulting multidimensional data representations. In order to do so, we propose an homogeneity criterion to measure the visual relevance of data representations. This criterion is based on the concept of geometric neighborhood and similarity between cells. Experimental results on real data have shown the interest of using our approach on sparse data cubes
NAP-SC: a neural approach for prediction over sparse cubes
International audienceOLAP techniques provide efficient solutions to navigate through data cubes. However, they are not equipped with frameworks that empower user investigation of interesting information. They are restricted to exploration tasks. Recently, various studies have been trying to extend OLAP to new capabilities by coupling it with data mining algorithms. However, most of these algorithms are not designed to deal with sparsity, which is an unavoidable consequence of the multidimensional structure of OLAP cubes. In [1], we proposed a novel approach that embeds Multilayer Perceptrons into OLAP environment to extend it to prediction. This approach has largely met its goals with limited sparsity cubes. However, its performances have decreased progressively with the increase of cube sparsity. In this paper, we propose a substantially modified version of our previous approach called NAP-SC (Neural Approach for Prediction over Sparse Cubes). Its main contribution consists in minimizing sparsity effect on measures prediction process through the application of a cube transformation step, based on a dedicated aggregation technique. Carried out experiments demonstrate the effectiveness and the robustness of NAP-SC against high sparsity data cube
A neural-based approach for extending OLAP to prediction
International audienceIn the Data Warehouse (DW) technology, On-line Analytical Processing (OLAP) is a good applications package that empowers decision makers to explore and navigate into a multidimensional structure of precomputed measures, which is referred to as a Data Cube. Though, OLAP is poorly equipped for forecasting and predicting empty measures of data cubes. Usually, empty measures translate inexistent facts in the DW and in most cases are a source of frustration for enterprise managements, especially when strategic decisions need to be taken. In the recent years, various studies have tried to add prediction capabilities to OLAP applications. For this purpose, generally, Data Mining and Machine Learning methods have been widely used to predict new measures' values in DWs. In this paper, we introduce a novel approach attempting to extend OLAP to a prediction application. Our approach operates in two main stages. The first one is a preprocessing one that makes use of the Principal Component Analysis (PCA) to reduce the dimensionality of the data cube and then generates ad hoc training sets. The second stage proposes a novel OLAP oriented architecture of Multilayer Perceptron Networks (MLP) that learns from each training set and comes out with predicted measures of inexistent facts. Carried out experiments demonstrate the effectiveness of our proposal and the performance of its predictive capabilitie
ABSTRACT Evaluation of a MCA-Based Approach to Organize Data Cubes
On Line Analysis Processing (OLAP) is a technology basically created to provide users with tools in order to explore and navigate into data cubes. Unfortunately, in huge and sparse data volumes, exploration becomes a tedious task and the simple user’s intuition or experience does not always lead to efficient results. In this paper, we propose to exploit the results of the Multiple Correspondence Analysis (MCA) in order to enhance a data cube representation. Our approach address the issues of organizing data in an interesting way and detecting relevant facts. We also treat the problem of evaluating the quality of data representation in a multidimensional space. For this, we propose a new criterion to measure the relevance of data representations. This criterion is based on the concept of geometric neighborhood and similarity between cells of a data cube. The experimental results we led on real data samples have shown the interest and the efficiency of our approach