1,713 research outputs found

    Data Cube Approximation and Mining using Probabilistic Modeling

    Get PDF
    On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data. Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches

    Designing data warehouses for geographic OLAP querying by using MDA

    Get PDF
    Data aggregation in Geographic Information Systems (GIS) is a desirable feature, spatial data are integrated in OLAP engines for this purpose. However, the development and operation of those systems is still a complex task due to methodologies followed. There are some ad hoc solutions that deal only with isolated aspects and do not provide developer and analyst with an intuitive, integrated and standard framework for designing all relevant parts. To overcome these problems, we have defined a model driven approach to accomplish Geographic Data Warehouse (GDW) development. Then, we have defined a data model required to implement and query spatial data. Its modeling is defined and implemented by using an extension of UML metamodel and it is also formalized by using OCL language. In addition, the proposal has been verified against a example scenario with sample data sets. For this purpose, we have accomplished a developing tool based on Eclipse platform and MDA standard. The great advantage of this solution is that developers can directly include spatial data at conceptual level, while decision makers can also conceptually make geographic queries without being aware of logical details.This work has been partially supported by the ESPIA project (TIN2007-67078) from the Spanish Ministry of Education and Science and by the QUASIMODO project (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain). Octavio Glorio is funded by the University of Alicante under the 11th Latin American grant program

    SOLAP+: extending the interaction model

    Get PDF
    Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfillment of the requirements for the degree of Master in Computer ScienceDecision making is a crucial process that can dictate success or failure in today’s businesses and organizations. Decision Support Systems (DSS) are designed in order to help human users with decision making activities. Inside the big family of DSSs there is OnLine Analytical Processing (OLAP) - an approach to answer multidimensional queries quickly and effectively. Even though OLAP is recognized as an efficient technique and widely used in mostly every area, it does not offer spatial analysis, spatial data visualization nor exploration. Geographic Information Systems (GIS) had a huge growth in the last years and acquiring and storing spatial data is easier than ever. In order to explore this potential and include spatial data and spatial analysis features to OLAP, Bédard introduced Spatial OLAP (SOLAP). Although it is a relatively new area, many proposals towards SOLAP’s standardization and consolidation have been made,as well as functional tools for different application areas. There are however many issues and topics in SOLAP that are either not covered or with incompatible/non general proposals. We propose to define a generic model for SOLAP interaction based on previous works, extending it to include new visualization options,components and cases; create and present a component-driven architecture proposal for such a tool, including descriptive metamodels, aggregate navigator to increase perfomance and a communication protocol; finally, develop an example prototype that partially implements the proposed interaction features, taking into consideration guidelines for a user friendly, yet powerful and flexible application

    OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data

    Get PDF
    Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting information under different perspectives and levels of granularity. Nevertheless, OLAP techniques do not allow the identification of relationships, groupings, or exceptions that could hold in a data cube. To that end, we propose to enrich OLAP techniques with data mining facilities to benefit from the capabilities they offer. In this chapter, we propose an online environment for mining association rules in data cubes. Our environment called OLEMAR (online environment for mining association rules), is designed to extract associations from multidimensional data. It allows the extraction of inter-dimensional association rules from data cubes according to a sum-based aggregate measure, a more general indicator than aggregate values provided by the traditional COUNT measure. In our approach, OLAP users are able to drive a mining process guided by a meta-rule, which meets their analysis objectives. In addition, the environment is based on a formalization, which exploits aggregate measures to revisit the definition of the support and the confidence of discovered rules. This formalization also helps evaluate the interestingness of association rules according to two additional quality measures: lift and loevinger. Furthermore, in order to focus on the discovered associations and validate them, we provide a visual representation based on the graphic semiology principles. Such a representation consists in a graphic encoding of frequent patterns and association rules in the same multidimensional space as the one associated with the mined data cube. We have developed our approach as a component in a general online analysis platform called Miningcubes according to an Apriori-like algorithm, which helps extract inter-dimensional association rules directly from materialized multidimensional structures of data. In order to illustrate the effectiveness and the efficiency of our proposal, we analyze a real-life case study about breast cancer data and conduct performance experimentation of the mining process
    corecore