645 research outputs found

    Using Ontologies for the Design of Data Warehouses

    Get PDF
    Obtaining an implementation of a data warehouse is a complex task that forces designers to acquire wide knowledge of the domain, thus requiring a high level of expertise and becoming it a prone-to-fail task. Based on our experience, we have detected a set of situations we have faced up with in real-world projects in which we believe that the use of ontologies will improve several aspects of the design of data warehouses. The aim of this article is to describe several shortcomings of current data warehouse design approaches and discuss the benefit of using ontologies to overcome them. This work is a starting point for discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure

    Using Semantic Web technologies in the development of data warehouses: A systematic mapping

    Get PDF
    The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043

    A conceptual framework and a risk management approach for interoperability between geospatial datacubes

    Get PDF
    De nos jours, nous observons un intérêt grandissant pour les bases de données géospatiales multidimensionnelles. Ces bases de données sont développées pour faciliter la prise de décisions stratégiques des organisations, et plus spécifiquement lorsqu’il s’agit de données de différentes époques et de différents niveaux de granularité. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de données géospatiales multidimensionnelles. Ces bases de données peuvent être sémantiquement hétérogènes et caractérisées par différent degrés de pertinence par rapport au contexte d’utilisation. Résoudre les problèmes sémantiques liés à l’hétérogénéité et à la différence de pertinence d’une manière transparente aux utilisateurs a été l’objectif principal de l’interopérabilité au cours des quinze dernières années. Dans ce contexte, différentes solutions ont été proposées pour traiter l’interopérabilité. Cependant, ces solutions ont adopté une approche non systématique. De plus, aucune solution pour résoudre des problèmes sémantiques spécifiques liés à l’interopérabilité entre les bases de données géospatiales multidimensionnelles n’a été trouvée. Dans cette thèse, nous supposons qu’il est possible de définir une approche qui traite ces problèmes sémantiques pour assurer l’interopérabilité entre les bases de données géospatiales multidimensionnelles. Ainsi, nous définissons tout d’abord l’interopérabilité entre ces bases de données. Ensuite, nous définissons et classifions les problèmes d’hétérogénéité sémantique qui peuvent se produire au cours d’une telle interopérabilité de différentes bases de données géospatiales multidimensionnelles. Afin de résoudre ces problèmes d’hétérogénéité sémantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents système représentant les bases de données géospatiales multidimensionnelles impliquées dans un processus d’interopérabilité. Cette communication vise à échanger de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents à prendre des décisions appropriées au cours du processus d’interopérabilité, nous évaluons un ensemble d’indicateurs de la qualité externe (fitness-for-use) des schémas et du contexte de production (ex., les métadonnées). Finalement, nous mettons en œuvre l’approche afin de montrer sa faisabilité.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility

    Interactive multidimensional modeling of linked data for exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft

    Big Data guided Digital Petroleum Ecosystems for Visual Analytics and Knowledge Management

    Get PDF
    The North West Shelf (NWS) interpreted as a Total Petroleum System (TPS), is Super Westralian Basin with active onshore and offshore basins through which shelf, - slope and deep-oceanic geological events are construed. In addition to their data associativity, TPS emerges with geographic connectivity through phenomena of digital petroleum ecosystem. The super basin has a multitude of sub-basins, each basin is associated with several petroleum systems and each system comprised of multiple oil and gas fields with either known or unknown areal extents. Such hierarchical ontologies make connections between attribute relationships of diverse petroleum systems. Besides, NWS has a scope of storing volumes of instances in a data-warehousing environment to analyse and motivate to create new business opportunities. Furthermore, the big exploration data, characterized as heterogeneous and multidimensional, can complicate the data integration process, precluding interpretation of data views, drawn from TPS metadata in new knowledge domains. The research objective is to develop an integrated framework that can unify the exploration and other interrelated multidisciplinary data into a holistic TPS metadata for visualization and valued interpretation. Petroleum digital ecosystem is prototyped as a digital oil field solution, with multitude of big data tools. Big data associated with elements and processes of petroleum systems are examined using prototype solutions. With conceptual framework of Digital Petroleum Ecosystems and Technologies (DPEST), we manage the interconnectivity between diverse petroleum systems and their linked basins. The ontology-based data warehousing and mining articulations ascertain the collaboration through data artefacts, the coexistence between different petroleum systems and their linked oil and gas fields that benefit the explorers. The connectivity between systems further facilitates us with presentable exploration data views, improvising visualization and interpretation. The metadata with meta-knowledge in diverse knowledge domains of digital petroleum ecosystems ensures the quality of untapped reservoirs and their associativity between Westralian basins

    Using metarules to integrate knowledge in knowledge based systems. An application in the woodworking industry

    Get PDF
    The current study addresses the integration of knowledge obtained from Data Mining structures and models into existing Knowledge Based solutions. It presents a technique adapted from commonKADS and spiral methodology to develop an initial knowledge solution using a traditional approach for requirement analysis, knowledge acquisition, and implementation. After an initial prototype is created and verified, the solution is enhanced incorporating new knowledge obtained from Online Analytical Processing, specifically from Data Mining models and structures using meta rules. Every meta rule is also verified prior to being included in the selection and translation of rules into the Expert System notation. Once an initial iteration was completed, responses from test cases were compared using an agreement index and kappa index. The problem domain was restricted to remake and rework operations in a cabinet making company. For Data Mining models, 8,674 cases of Price of Non Conformance (PONC) were used for a period of time of 3 months. Initial results indicated that the technique presented sufficient formalism to be used in the development of new systems, using Trillium scale. The use of 50 additional cases randomly selected from different departments indicated that responses from the original system and the solution that incorporated new knowledge from Data Mining differed significantly. Further inspection of responses indicated that the new solution with additional 68 rules was able to answer, although with an incorrect alternative in 28 additional cases that the initial solution was not able to provide a conclusion

    Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users

    SUPPORTING FINANCIAL DATA WAREHOUSE DEVELOPMENT: A COMMUNICATION THEORY-BASED APPROACH

    Get PDF
    Data warehouses increasingly play important roles in the information technology landscape of the financial industry. However, semantic heterogeneity is high in banking – data is defined differently by different banks, business units, and users. Therefore data integration in financial data warehouse development projects relies on the knowledge, know-how, and judgment of human experts. Up to now, methodical support is missing for the communication process among experts that determine and negotiate a shared understanding of requirements. In contrast to ontologydriven or schema-matching approaches proposing the automatic resolution of differences ex-post, we introduce an approach that addresses data integration already in early project phases. Our approach supports developing shared understanding of domain concepts and data fields in financial data warehouse projects, good communication of all participants while the project progresses, and early detection of errors within projects. This way, we prevent problems that result from the ex-post resolution of semantic heterogeneity

    Implementation of the multidimensional schemas integration method ORE

    Get PDF
    The goal of the project is the implementation of the semi-automatic method, named ORE, for creating multidimentional schemas for data warehouses by integrating information requirements in an iterative way
    • …
    corecore