645 research outputs found
Using Ontologies for the Design of Data Warehouses
Obtaining an implementation of a data warehouse is a complex task that forces
designers to acquire wide knowledge of the domain, thus requiring a high level
of expertise and becoming it a prone-to-fail task. Based on our experience, we
have detected a set of situations we have faced up with in real-world projects
in which we believe that the use of ontologies will improve several aspects of
the design of data warehouses. The aim of this article is to describe several
shortcomings of current data warehouse design approaches and discuss the
benefit of using ontologies to overcome them. This work is a starting point for
discussing the convenience of using ontologies in data warehouse design.Comment: 15 pages, 2 figure
Using Semantic Web technologies in the development of data warehouses: A systematic mapping
The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043
A conceptual framework and a risk management approach for interoperability between geospatial datacubes
De nos jours, nous observons un intérêt grandissant pour les bases de données géospatiales multidimensionnelles. Ces bases de données sont développées pour faciliter la prise de décisions stratégiques des organisations, et plus spécifiquement lorsqu’il s’agit de données de différentes époques et de différents niveaux de granularité. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de données géospatiales multidimensionnelles. Ces bases de données peuvent être sémantiquement hétérogènes et caractérisées par différent degrés de pertinence par rapport au contexte d’utilisation. Résoudre les problèmes sémantiques liés à l’hétérogénéité et à la différence de pertinence d’une manière transparente aux utilisateurs a été l’objectif principal de l’interopérabilité au cours des quinze dernières années. Dans ce contexte, différentes solutions ont été proposées pour traiter l’interopérabilité. Cependant, ces solutions ont adopté une approche non systématique. De plus, aucune solution pour résoudre des problèmes sémantiques spécifiques liés à l’interopérabilité entre les bases de données géospatiales multidimensionnelles n’a été trouvée. Dans cette thèse, nous supposons qu’il est possible de définir une approche qui traite ces problèmes sémantiques pour assurer l’interopérabilité entre les bases de données géospatiales multidimensionnelles. Ainsi, nous définissons tout d’abord l’interopérabilité entre ces bases de données. Ensuite, nous définissons et classifions les problèmes d’hétérogénéité sémantique qui peuvent se produire au cours d’une telle interopérabilité de différentes bases de données géospatiales multidimensionnelles. Afin de résoudre ces problèmes d’hétérogénéité sémantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents système représentant les bases de données géospatiales multidimensionnelles impliquées dans un processus d’interopérabilité. Cette communication vise à échanger de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents à prendre des décisions appropriées au cours du processus d’interopérabilité, nous évaluons un ensemble d’indicateurs de la qualité externe (fitness-for-use) des schémas et du contexte de production (ex., les métadonnées). Finalement, nous mettons en œuvre l’approche afin de montrer sa faisabilité.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility
Interactive multidimensional modeling of linked data for exploratory OLAP
Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft
Big Data guided Digital Petroleum Ecosystems for Visual Analytics and Knowledge Management
The North West Shelf (NWS) interpreted as a Total
Petroleum System (TPS), is Super Westralian Basin with
active onshore and offshore basins through which shelf, -
slope and deep-oceanic geological events are construed. In
addition to their data associativity, TPS emerges with
geographic connectivity through phenomena of digital
petroleum ecosystem. The super basin has a multitude of
sub-basins, each basin is associated with several petroleum
systems and each system comprised of multiple oil and gas
fields with either known or unknown areal extents. Such
hierarchical ontologies make connections between
attribute relationships of diverse petroleum systems.
Besides, NWS has a scope of storing volumes of instances
in a data-warehousing environment to analyse and
motivate to create new business opportunities.
Furthermore, the big exploration data, characterized as
heterogeneous and multidimensional, can complicate the
data integration process, precluding interpretation of data
views, drawn from TPS metadata in new knowledge
domains. The research objective is to develop an
integrated framework that can unify the exploration and
other interrelated multidisciplinary data into a holistic TPS
metadata for visualization and valued interpretation.
Petroleum digital ecosystem is prototyped as a digital oil
field solution, with multitude of big data tools. Big data
associated with elements and processes of petroleum
systems are examined using prototype solutions. With
conceptual framework of Digital Petroleum Ecosystems
and Technologies (DPEST), we manage the
interconnectivity between diverse petroleum systems and
their linked basins. The ontology-based data warehousing
and mining articulations ascertain the collaboration
through data artefacts, the coexistence between different
petroleum systems and their linked oil and gas fields that
benefit the explorers. The connectivity between systems
further facilitates us with presentable exploration data
views, improvising visualization and interpretation. The
metadata with meta-knowledge in diverse knowledge
domains of digital petroleum ecosystems ensures the
quality of untapped reservoirs and their associativity
between Westralian basins
Using metarules to integrate knowledge in knowledge based systems. An application in the woodworking industry
The current study addresses the integration of knowledge obtained from Data Mining structures and models into existing Knowledge Based solutions. It presents a technique adapted from commonKADS and spiral methodology to develop an initial knowledge solution using a traditional approach for requirement analysis, knowledge acquisition, and implementation. After an initial prototype is created and verified, the solution is enhanced incorporating new knowledge obtained from Online Analytical Processing, specifically from Data Mining models and structures using meta rules. Every meta rule is also verified prior to being included in the selection and translation of rules into the Expert System notation. Once an initial iteration was completed, responses from test cases were compared using an agreement index and kappa index.
The problem domain was restricted to remake and rework operations in a cabinet making company. For Data Mining models, 8,674 cases of Price of Non Conformance (PONC) were used for a period of time of 3 months.
Initial results indicated that the technique presented sufficient formalism to be used in the development of new systems, using Trillium scale. The use of 50 additional cases randomly selected from different departments indicated that responses from the original system and the solution that incorporated new knowledge from Data Mining differed significantly. Further inspection of responses indicated that the new solution with additional 68 rules was able to answer, although with an incorrect alternative in 28 additional cases that the initial solution was not able to provide a conclusion
Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP
Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users
SUPPORTING FINANCIAL DATA WAREHOUSE DEVELOPMENT: A COMMUNICATION THEORY-BASED APPROACH
Data warehouses increasingly play important roles in the information technology landscape of the financial industry. However, semantic heterogeneity is high in banking – data is defined differently by different banks, business units, and users. Therefore data integration in financial data warehouse development projects relies on the knowledge, know-how, and judgment of human experts. Up to now, methodical support is missing for the communication process among experts that determine and negotiate a shared understanding of requirements. In contrast to ontologydriven or schema-matching approaches proposing the automatic resolution of differences ex-post, we introduce an approach that addresses data integration already in early project phases. Our approach supports developing shared understanding of domain concepts and data fields in financial data warehouse projects, good communication of all participants while the project progresses, and early detection of errors within projects. This way, we prevent problems that result from the ex-post resolution of semantic heterogeneity
Implementation of the multidimensional schemas integration method ORE
The goal of the project is the implementation of the semi-automatic method, named ORE, for creating multidimentional schemas for data warehouses by integrating information requirements in an iterative way
- …