1,713 research outputs found
Data Cube Approximation and Mining using Probabilistic Modeling
On-line Analytical Processing (OLAP) techniques commonly used in data warehouses allow the exploration of data cubes according to different analysis axes (dimensions) and under different abstraction levels in a dimension hierarchy. However, such techniques are not aimed at mining multidimensional data.
Since data cubes are nothing but multi-way tables, we propose to analyze the potential of two probabilistic modeling techniques, namely non-negative multi-way array factorization and log-linear modeling, with the ultimate objective of compressing and mining aggregate and multidimensional values. With the first technique, we compute the set of components that best fit the initial data set and whose superposition coincides with the original data; with the second technique we identify a parsimonious model (i.e., one with a reduced set of parameters), highlight strong associations among dimensions and discover possible outliers in data cells. A real life example will be
used to (i) discuss the potential benefits of the modeling output on cube exploration and mining, (ii) show how OLAP queries can be answered in an approximate way, and (iii) illustrate the strengths and limitations of these modeling approaches
Designing data warehouses for geographic OLAP querying by using MDA
Data aggregation in Geographic Information Systems (GIS) is a desirable feature, spatial data are integrated in OLAP engines for this purpose. However, the development and operation of those systems is still a complex task due to methodologies followed. There are some ad hoc solutions that deal only with isolated aspects and do not provide developer and analyst with an intuitive, integrated and standard framework for designing all relevant parts. To overcome these problems, we have defined a model driven approach to accomplish Geographic Data Warehouse (GDW) development. Then, we have defined a data model required to implement and query spatial data. Its modeling is defined and implemented by using an extension of UML metamodel and it is also formalized by using OCL language. In addition, the proposal has been verified against a example scenario with sample data sets. For this purpose, we have accomplished a developing tool based on Eclipse platform and MDA standard. The great advantage of this solution is that developers can directly include spatial data at conceptual level, while decision makers can also conceptually make geographic queries without being aware of logical details.This work has been partially supported by the ESPIA project (TIN2007-67078) from the Spanish Ministry of Education and Science and by the QUASIMODO project (PAC08-0157-0668) from the Castilla-La Mancha Ministry of Education and Science (Spain). Octavio Glorio is funded by the University of Alicante under the 11th Latin American grant program
SOLAP+: extending the interaction model
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa,
in partial fulfillment of the requirements for the degree of Master in Computer ScienceDecision making is a crucial process that can dictate success or failure in today’s businesses and organizations. Decision Support Systems (DSS) are designed in order to help human users with decision making activities. Inside the big family of DSSs there is OnLine Analytical Processing (OLAP) - an approach to answer multidimensional queries quickly and effectively.
Even though OLAP is recognized as an efficient technique and widely used in mostly every area, it does not offer spatial analysis, spatial data visualization nor exploration. Geographic Information Systems (GIS) had a huge growth in the last years and acquiring and storing spatial data is easier than ever. In order to explore this potential and include spatial data and spatial analysis features to OLAP, Bédard introduced Spatial OLAP (SOLAP). Although it is a relatively new area, many proposals towards SOLAP’s standardization and consolidation have been made,as well as functional tools for different application areas.
There are however many issues and topics in SOLAP that are either not covered or with
incompatible/non general proposals. We propose to define a generic model for SOLAP
interaction based on previous works, extending it to include new visualization options,components and cases; create and present a component-driven architecture proposal for such a tool, including descriptive metamodels, aggregate navigator to increase perfomance and a communication protocol; finally, develop an example prototype that partially implements the
proposed interaction features, taking into consideration guidelines for a user friendly, yet powerful and flexible application
OLEMAR: An Online Environment for Mining Association Rules in Multidimensional Data
Data warehouses and OLAP (online analytical processing) provide tools to explore and navigate through data cubes in order to extract interesting information under different perspectives and levels of granularity. Nevertheless, OLAP techniques do not allow the identification of relationships, groupings, or exceptions that could hold in a data cube. To that end, we propose to enrich OLAP techniques with data mining facilities to benefit from the capabilities they offer. In this chapter, we propose an online environment for mining association rules in data cubes. Our environment called OLEMAR (online environment for mining association rules), is designed to extract associations from multidimensional data. It allows the extraction of inter-dimensional association rules from data cubes according to a sum-based aggregate measure, a more general indicator than aggregate values provided by the traditional COUNT measure. In our approach, OLAP users are able to drive a mining process guided by a meta-rule, which meets their analysis objectives. In addition, the environment is based on a formalization, which exploits aggregate measures to revisit the definition of the support and the confidence of discovered rules. This formalization also helps evaluate the interestingness of association rules according to two additional quality measures: lift and loevinger. Furthermore, in order to focus on the discovered associations and validate them, we provide a visual representation based on the graphic semiology principles. Such a representation consists in a graphic encoding of frequent patterns and association rules in the same multidimensional space as the one associated with the mined data cube. We have developed our approach as a component in a general online analysis platform called Miningcubes according to an Apriori-like algorithm, which helps extract inter-dimensional association rules directly from materialized multidimensional structures of data. In order to illustrate the effectiveness and the efficiency of our proposal, we analyze a real-life case study about breast cancer data and conduct performance experimentation of the mining process
Recommended from our members
Interactive Visual Analysis of Heterogeneous Cohort Study Data
Cohort studies in medicine are conducted to enable the study of medical hypotheses in large samples. Often, a large amount of heterogeneous data is acquired from many subjects. The analysis is usually hypothesis-driven, i.e., a specific subset of such data is studied to confirm or reject specific hypotheses. In this paper, we demonstrate how we enable the interactive visual exploration and analysis of such data, helping with the generation of new hypotheses and contributing to the process of validating them. We propose a data-cube based model which handles partially overlapping data subsets during the interactive visualization. This model enables seamless integration of the heterogeneous data, as well as linking spatial and non-spatial views on these data. We implemented this model in an application prototype, and used it to analyze data acquired in the context of a cohort study on cognitive aging. We present case-study analyses of selected aspects of brain connectivity by using the prototype implementation of the presented model, to demonstrate its potential and flexibility
Recommended from our members
Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time
We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales
- …