136 research outputs found

    Integration of Data Mining and Data Warehousing: a practical methodology

    Get PDF
    The ever growing repository of data in all fields poses new challenges to the modern analytical systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has to be modeled in form of a data warehouse schema. Schema generation process is complex manual task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful knowledge from large datasets

    Computer-Aided Warehouse Engineering (CAWE): Leveraging MDA and ADM for the Development of Data Warehouses

    Get PDF
    During the last decade, data warehousing has reached a high maturity and is a well-accepted technology in decision support systems. Nevertheless, development and maintenance are still tedious tasks since the systems grow over time and complex architectures have been established. The paper at hand adopts the concepts of Model Driven Architecture (MDA) and Architecture Driven Modernization (ADM) taken from the software engineering discipline to the data warehousing discipline. We show the works already available, outline further research directions and give hints for implementation of Computer-Aided Warehouse Engineering systems

    Dimensional enrichment of statistical linked open data

    Get PDF
    On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits of QB4OLAP. Then, we propose a series of steps to automate the enrichment of QB data sets with specific QB4OLAP semantics; being the most important, the definition of aggregate functions and the detection of new concepts in the dimension hierarchy construction. The proposed steps are defined to form a semi-automatic enrichment method, which is implemented in a tool that enables the enrichment in an interactive and iterative fashion. The user can enrich the QB data set with QB4OLAP concepts (e.g., full-fledged dimension hierarchies) by choosing among the candidate concepts automatically discovered with the steps proposed. Finally, we conduct experiments with 25 users and use three real-world QB data sets to evaluate our approach. The evaluation demonstrates the feasibility of our approach and shows that, in practice, our tool facilitates, speeds up, and guarantees the correct results of the enrichment process.Peer ReviewedPostprint (author's final draft

    Interactive multidimensional modeling of linked data for exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft

    Multidimensional modeling and analysis of large and complex watercourse data: an OLAP-based solution

    Get PDF
    International audienceThis paper presents the application of Data Warehouse (DW) and On-Line Analytical Processing (OLAP) technologies to the field of water quality assessment. The European Water Framework Directive (DCE, 2000) underlined the necessity of having operational tools to help in the interpretation of the complex and abundant information regarding running waters and their functioning. Several studies have exemplified the interest in DWs for integrating large volumes of data and in OLAP tools for data exploration and analysis. Based on free software tools, we propose an extensible relational OLAP system for the analysis of physicochemical and hydrobiological watercourse data. This system includes: (i) two data cubes; (ii) an Extract, Transform and Load (ETL) tool for data integration; and (iii) tools for OLAP exploration. Many examples of OLAP analysis (thematic, temporal, spatiotemporal, and multiscale) are provided. We have extended an existing framework with complex aggregate functions that are used to define complex analysis indicators. Additional analysis dimensions are also introduced to allow their calculation and also for purposes of rendering information. Finally, we propose two strategies to address the problem of summarizing heterogeneous measurement units by: (i) transforming source data at the ETL tier, and (ii) introducing an additional analysis dimension at the OLAP server tier

    Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users
    • …
    corecore