24 research outputs found

    Using Semantic Web technologies in the development of data warehouses: A systematic mapping

    Get PDF
    The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043

    A BPMN-Based Design and Maintenance Framework for ETL Processes

    Get PDF
    Business Intelligence (BI) applications require the design, implementation, and maintenance of processes that extract, transform, and load suitable data for analysis. The development of these processes (known as ETL) is an inherently complex problem that is typically costly and time consuming. In a previous work, we have proposed a vendor-independent language for reducing the design complexity due to disparate ETL languages tailored to specific design tools with steep learning curves. Nevertheless, the designer still faces two major issues during the development of ETL processes: (i) how to implement the designed processes in an executable language, and (ii) how to maintain the implementation when the organization data infrastructure evolves. In this paper, we propose a model-driven framework that provides automatic code generation capability and ameliorate maintenance support of our ETL language. We present a set of model-to-text transformations able to produce code for different ETL commercial tools as well as model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using an example is conducted as an initial validation to show that the framework covering modeling, code generation and maintenance could be used in practice

    A hyperconnected manufacturing collaboration system using the semantic web and Hadoop ecosystem system

    Get PDF
    With the explosive growth of digital data communications in synergistic operating networks and cloud computing service, hyperconnected manufacturing collaboration systems face the challenges of extracting, processing, and analyzing data from multiple distributed web sources. Although semantic web technologies provide the solution to web data interoperability by storing the semantic web standard in relational databases for processing and analyzing of web-accessible heterogeneous digital data, web data storage and retrieval via the predefined schema of relational / SQL databases has become increasingly inefficient with the advent of big data. In response to this problem, the Hadoop Ecosystem System is being adopted to reduce the complexity of moving data to and from the big data cloud platform. This paper proposes a novel approach in a set of the Hadoop tools for information integration and interoperability across hyperconnected manufacturing collaboration systems. In the Hadoop approach, data is “Extracted” from the web sources, “Loaded” into a set of the NoSQL Hadoop Database (HBase) tables, and then “Transformed” and integrated into the desired format model with Hive's schema-on-read. A case study was conducted to illustrate that the Hadoop Extract-Load-Transform (ELT) approach for the syntax and semantics web data integration could be adopted across the global smartphone value chain

    Users Integrity Constraints in SOLAP Systems. Application in Agroforestry

    Get PDF
    SpatialData Warehouse and Spatial On-Line Analytical Processing are decision support technologies which offer the spatial and multidimensional analysis of data stored in multidimensional structure. They are aimed also at supporting geographic knowledge discovery to help decision-maker in his job related to make the appropriate decision . However, if we don’t consider data quality in the spatial hypercubes and how it is explored, it may provide unreliable results. In this paper, we propose a system for the implementation of user integrity constraints in SOLAP namely “UIC-SOLAP”. It corresponds to a methodology for guaranteeing results quality in an analytical process effectuated by different users exploiting several facts tables within the same hypercube. We integrate users Integrity Constraints (IC) by specifying visualization ICs according to their preferences and we define inter-facts ICs in this case. In order to validate our proposition, we propose the multidimensional modeling by UML profile to support constellation schema of a hypercube with several fact tables related to subjects of analysis in forestry management. Then, we propose implementation of some ICs related to users of such a system

    The potential of semantic paradigm in warehousing of big data

    Get PDF
    Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases

    Interactive Multidimensional Modeling of Linked Data for Exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll- up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users

    Interactive multidimensional modeling of linked data for exploratory OLAP

    Get PDF
    Exploratory OLAP aims at coupling the precision and detail of corporate data with the information wealth of LOD. While some techniques to create, publish, and query RDF cubes are already available, little has been said about how to contextualize these cubes with situational data in an on-demand fashion. In this paper we describe an approach, called iMOLD, that enables non-technical users to enrich an RDF cube with multidimensional knowledge by discovering aggregation hierarchies in LOD. This is done through a user-guided process that recognizes in the LOD the recurring modeling patterns that express roll-up relationships between RDF concepts, then translates these patterns into aggregation hierarchies to enrich the RDF cube. Two families of aggregation patterns are identified, based on associations and generalization respectively, and the algorithms for recognizing them are described. To evaluate iMOLD in terms of efficiency and effectiveness we compare it with a related approach in the literature, we propose a case study based on DBpedia, and we discuss the results of a test made with real users.Peer ReviewedPostprint (author's final draft

    Interacting with Statistical Linked Data via OLAP Operations

    Get PDF
    Abstract. Online Analytical Processing (OLAP) promises an interface to analyse Linked Data containing statistics going beyond other interaction paradigms such as follow-your-nose browsers, faceted-search interfaces and query builders. Transforming statistical Linked Data into a star schema to populate a relational database and applying a common OLAP engine do not allow to optimise OLAP queries on RDF or to directly propagate changes of Linked Data sources to clients. Therefore, as a new way to interact with statistics published as Linked Data, we investigate the problem of executing OLAP queries via SPARQL on an RDF store. For that, we first define projection, slice, dice and roll-up operations on single data cubes published as Linked Data reusing the RDF Data Cube vocabulary and show how a nested set of operations lead to an OLAP query. Second, we show how to transform an OLAP query to a SPARQL query which generates all required tuples from the data cube. In a small experiment, we show the applicability of our OLAPto-SPARQL mapping in answering a business question in the financial domain

    Conem: um modelo para representação e análise de informação de redes espaciais em data warehouses

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-graduação em Ciência da Computação, Florianópolis, 2011Um Data Warehouse Espaço-Temporal (DWET) manipula concomitantemente dados convencionais, espaciais e temporais. Uma necessidade ainda não atendida pela tecnologia de DWET é o suporte à análise de informação de redes complexas de elementos espaciais. Neste sentido, este trabalho propõe um modelo para a análise de redes complexas em DWET. Inspirado em ideias da Geografia, este modelo tem por objetivo representar a estrutura da rede e os estados dos elementos que a compõem, para suportar a análise da evolução do estado de diferentes porções da rede ao longo do tempo. O modelo proposto utiliza ontologias para descrever hierarquias de tipos de elementos da rede, baseadas em conceitualizações específicas do domínio de aplicação, além de ontologias sobre partições do espaço e do tempo. Dimensões de datamarts podem ser geradas a partir de visões dessas ontologias, para contemplar necessidades de análise específicas. O modelo proposto estende um modelo dimensional espaço-temporal para suportar OLAP espacial (SOLAP) com os elementos da rede, usando dimensões de análise definidas de acordo com hierarquias contidas nas ontologias. Ele também define um operador denominado Trace para permitir a análise da evolução do estado dos componentes de porções da rede, selecionadas de acordo com as dimensões de análise definidas para o datamart. O modelo proposto foi implementado em um protótipo. A interface gráfica, baseada em tabelas e mapas, está integrada ao módulo SOLAP. Ao navegar pelos mapas e tabelas apresentando resultados de operações SOLAP, outras operações SOLAP podem ser invocadas e os resultados apresentados em novos gráficos e tabelas. Um slider permite a análise da evolução temporal do estado de porções da rede. Por fim, a modelo é avaliado em um estudo de caso do setor elétrico, o qual possibilita a investigação de padrões e tendências espaço-temporais em diferentes porções de uma rede de distribuição de energia elétrica
    corecore