155,357 research outputs found

    Combining Objects with Rules to Represent Aggregation Knowledge in Data Warehouse and OLAP Systems

    Get PDF
    Data warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional data. Typically, users need to analyze data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in Production Rule Representation (PRR) language. Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge allows an early modeling of user requirements in a data warehouse project.Aggregation; Conceptual Multidimensional Model; Data Warehouse; On-line Analytical Processing (OLAP); Production Rule; UML

    IS THERE STILL A NEED FOR MULTIDIMENSIONAL DATA MODELS?

    Get PDF
    Organizational and technical changes challenge standards of data warehouse design and initiate a redesign of contemporary Business Intelligence and Analytics environments. As a result, the use of multidimensional models for performance oriented reasons is not necessarily taken for granted. Simple data models or operational structures emerge as a basis for complex analyses. The paper therefore conducts a laboratory experiment to examine from a non-technical perspective the influnce of different data modeling types on the representational information quality of end users. A comparison is made between the multidimensional model and the transactional model respectively the flat file model. The experiment involves 78 participants and aims to compare perceived and observed representational information quality aspects of ad hoc analyses regarding the data modeling type. The results indicate a higher observed quality for multidimensional modeled data, while different types of data models do not influnce the end user perception of the representational information quality

    Latent space models for multidimensional network data

    Get PDF
    Network data are any relational data recorded among a group of individuals, the nodes. When multiple relations are recorded among the same set of nodes, a more complex object arises, which we refer to as “multidimensional network”, or “multiplex”, where different relations corresponding to different networks. In the past, statistical analysis of networks has mainly focused on single-relation network data, referring to a single relation of interest. Only in recent years statistical models specifically tailored for multiplex data begun to be developed. In this context, only a few works have been introduced in the literature with the aim at extending the latent space modeling framework to multiplex data. Such framework postulates that nodes may be characterized by latent positions in a p-dimensional Euclidean space and that the presence/absence of an edge between any two nodes depends on such positions. When considering multidimensional network data, latent space models can help capture the associations between the nodes and summarize the observed structure in the different networks composing a multiplex. This dissertation discusses some latent space models for multidimensional network data, to account for different features that observed multiplex data may present. A first proposal allows to jointly represent the different networks into a single latent space, so that average similarities between the nodes may be captured as proximities in such space. A second work introduces a class of latent space models with node-specific effects, in order to deal with different degrees of heterogeneity within and between networks in multiplex data, corresponding to different types of node-specific behaviours. A third work addresses the issue of clustering of the nodes in the latent space, a frequently observed feature in many real world network and multidimensional network data. Here, clusters of nodes in the latent space correspond to communities of nodes in the multiplex. The proposed models are illustrated both via simulation studies and real world applications, to study their perfomances and abilities

    On the use of Structural Equation Models and PLS Path Modeling to build composite indicators

    Get PDF
    Nowadays there is a pre-eminent need to measure very complex phenomena like poverty, progress, well-being, etc. As is well known, the main feature of a composite indicator is that it summarizes complex and multidimensional issues. Thanks to its features, Structural Equation Modeling seems to be a useful tool for building systems of composite indicators. Among the several methods that have been developed to estimate Structural Equation Models we focus on the PLS Path Modeling approach (PLS-PM), because of the key role that estimation of the latent variables (i.e. the composite indicators) plays in the estimation process. In this work, first we present Structural Equation Models and PLS-PM. Then we provide a suite of statistical methodologies for handling categorical indicators in PLS-PM. In particular, in order to take categorical indicators into account, we propose to use a modified version of the PLS-PM algorithm recently presented by Russolillo [2009]. This new approach provides a quantification of the categorical indicators in such a way that the weight of each quantified indicator is coherent with the explicative ability of the corresponding categorical indicator. To conclude, an application involving data taken from a paper by Russet [1964] will be presented.PLS Path Modeling,Categorical Indicators,Structural Equation Modeling,Composite Indicators

    Design of a Multidimensional Model Using Object Oriented Features in UML

    Get PDF
    A data warehouse is a single repository of data which includes data generated from various operational systems. Conceptual modeling is an important concept in the successful design of a data warehouse. The Unified Modeling Language (UML) has become a standard for object modeling during analysis and design steps of software system development. The paper proposes an object oriented approach to model the process of data warehouse design. The hierarchies of each data element can be explicitly defined, thus highlighting the data granularity. We propose a UML multidimensional model using various data sources based on UML schemas. We present a conceptual-level integration framework on diverse UML data sources on which OLAP operations can be performed. Our integration framework takes into account the benefits of UML (its concepts, relationships and extended features) which is more close to the real world and can model even the complex problems easily and accurately. Two steps are involved in our integration framework. The first one is to convert UML schemas into UML class diagrams. The second is to build a multidimensional model from the UML class diagrams. The white-paper focuses on the transformations used in the second step. We describe how to represent a multidimensional model using a UML star or snowflake diagram with the help of a case study. To the best of our knowledge, we are the first people to represent a UML snowflake diagram that integrates heterogeneous UML data sources

    Multidimensional modeling and analysis of large and complex watercourse data: an OLAP-based solution

    Get PDF
    International audienceThis paper presents the application of Data Warehouse (DW) and On-Line Analytical Processing (OLAP) technologies to the field of water quality assessment. The European Water Framework Directive (DCE, 2000) underlined the necessity of having operational tools to help in the interpretation of the complex and abundant information regarding running waters and their functioning. Several studies have exemplified the interest in DWs for integrating large volumes of data and in OLAP tools for data exploration and analysis. Based on free software tools, we propose an extensible relational OLAP system for the analysis of physicochemical and hydrobiological watercourse data. This system includes: (i) two data cubes; (ii) an Extract, Transform and Load (ETL) tool for data integration; and (iii) tools for OLAP exploration. Many examples of OLAP analysis (thematic, temporal, spatiotemporal, and multiscale) are provided. We have extended an existing framework with complex aggregate functions that are used to define complex analysis indicators. Additional analysis dimensions are also introduced to allow their calculation and also for purposes of rendering information. Finally, we propose two strategies to address the problem of summarizing heterogeneous measurement units by: (i) transforming source data at the ETL tier, and (ii) introducing an additional analysis dimension at the OLAP server tier
    • 

    corecore