Search CORE

2,021 research outputs found

A unified view of data-intensive flows in business intelligence systems : a survey

Author: Abelló Gamazo Alberto
Jovanovic Petar
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Scaling out federated queries for life sciences data in production

Author: Constandt Hans
De Vocht Laurens
De Witte Dieter
Mannens Erik
Pattyn Filip
Verborgh Ruben
Publication venue
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

A Semantic Safety Check System for Emergency Management

Author: Srividya K. Bansal
Yogesh Pandey
Publication venue: RonPub
Publication date: 01/01/2017
Field of study

There has been an exponential growth and availability of both structured and unstructured data that can be leveraged to provide better emergency management in case of natural disasters and humanitarian crises. This paper is an extension of a semantics-based web application for safety check, which uses of semantic web technologies to extract different kinds of relevant data about a natural disaster and alerts its users. The goal of this work is to design and develop a knowledge intensive application that identifies those people that may have been affected due to natural disasters or man-made disasters at any geographical location and notify them with safety instructions. This involves extraction of data from various sources for emergency alerts, weather alerts, and contacts data. The extracted data is integrated using a semantic data model and transformed into semantic data. Semantic reasoning is done through rules and queries. This system is built using front-end web development technologies and at the back-end using semantic web technologies such as RDF, OWL, SPARQL, Apache Jena, TDB, and Apache Fuseki server. We present the details of the overall approach, process of data collection and transformation and the system built. This extended version includes a detailed discussion of the semantic reasoning module, research challenges in building this software system, related work in this area, and future research directions including the incorporation of geospatial components and standards

RonPub -- Research Online Publishing

Using Semantic Web technologies in the development of data warehouses: A systematic mapping

Author: Ali A. A.
Bennett D.
Chen H.
Coral C.
El Sarraj L.
Elamin E.
Etcheverry L.
Henschen D.
Inmon W. H.
Jossen C.
Kalampokis E.
Kimball R.
Klyne G.
Lather S.
Martin A.
Petersen K.
Ramasamy V.
Salguero A.
Salguero A.
Sell D.
Steiner D.
Thenmozhi M.
Wache H.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

The exploration and use of Semantic Web technologies have attracted considerable attention from researchers examining data warehouse (DW) development. However, the impact of this research and the maturity level of its results are still unclear. The objective of this study is to examine recently published research articles that take into account the use of Semantic Web technologies in the DW arena with the intention of summarizing their results, classifying their contributions to the field according to publication type, evaluating the maturity level of the results, and identifying future research challenges. Three main conclusions were derived from this study: (a) there is a major technological gap that inhibits the wide adoption of Semantic Web technologies in the business domain;(b) there is limited evidence that the results of the analyzed studies are applicable and transferable to industrial use; and (c) interest in researching the relationship between DWs and Semantic Web has decreased because new paradigms, such as linked open data, have attracted the interest of researchers.This study was supported by the Universidad de La Frontera, Chile, PROY. DI15-0020. Universidad de la Frontera, Chile, Grant Numbers: DI15-0020 and DI17-0043

Repositorio Institucional de la Universidad de Alicante

Crossref

Data Mapping for XBRL: A Systematic Literature Review

Author: Acosta Bragança Henderson
Bernardino Piraja Gomes Nacles
Caetano da Silva Paulo
Publication venue: American Academic Scientific Research Journal for Engineering, Technology, and Sciences
Publication date: 10/10/2022
Field of study

It is evident the growth of the use of eXtensible Business Reporting Language (XBRL) technology in the context of financial reports on the Internet, either for its advantages and benefits or by government impositions, however, the data to be transported by this language are mostly stored in structures defined as database, some relational other NoSQL. The need to integrate XBRL technology with other data storage technologies has been growing continuously, and research is needed to seek a solution for mapping data between these environments. The possible difficulties in integrating XBRL with other technologies, relational database or NoSQL, CSV files, JSON, need to be mapped and overcome. Generating XBRL documents from the database can be costly, since there is no native alternative that the database manager system exports from the database manager system, the data in XBRL. For this, specific third-party systems are needed to generate XBRL documents. Generally, these systems are proprietary and have a high cost. Integrate these different technologies adds complexity, since these documents do not connect to the database manager system. These difficulties cause performance and storage problems and in cases of large data, such as data delivery to government agencies, complexity increases. Thus, it is essential to study techniques and methods that allow us to infer a solution to perform this integration and/or mapping, preferably in a generic way, that includes the XBRL data structure and the main data models currently used, i.e. Relational DBMS, NoSQL, JSON or CSV files. It is expected, in this work, through a systematic literature review, to identify the state of the art concerning the mapping of XBRL data

American Scientific Research Journal for Engineering, Technology, and Sciences (ASRJETS)

A DIN Spec 91345 RAMI 4.0 Compliant Data Pipelining Model: An Approach to Support Data Understanding and Data Acquisition in Smart Manufacturing Environments

Author: Colombo Armando Walter
Nagorny Kevin
Oliveira José Barata
Scholze Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Today, data scientists in the manufacturing domain are confronted with various communication standards, protocols and technologies to save and transfer various kinds of data. These circumstances makes it hard to understand, find, access and extract data needed for use case depended applications. One solution could be a data pipelining approach enforced by a semantic model which describes smart manufacturing assets itself and the access to their data along their life-cycle. Many research contributions in smart manufacturing already came out with with reference architectures like the RAMI 4.0 or standards for meta data description or asset classification. Our research builds upon these outcomes and introduces a semantic model based DIN Spec 91345 (RAMI 4.0) compliant data pipelining approach with the smart manufacturing domain as exemplary use case. This paper has a focus on the developed semantic model used to enable an easy data exploration, finding, access and extraction of data, compatible with various used communication standards, protocols and technologies used to save and transfer data.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Ontology-based data integration between clinical and research systems

Author: Bürkle Thomas
Ganslandt Thomas
Köpcke Felix
Martin Marcus
Mate Sebastian
Prokosch Hans-Ulrich
Toddenroth Dennis
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

Data from the electronic medical record comprise numerous structured but uncoded elements, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of relevant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it

Berner Fachhochschule: ARBOR

Directory of Open Access Journals

PubMed Central

FigShare

Aspects of semantic ETL

Author: Deb Nath Rudra Pratap
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/10/2020
Field of study

Tesi en modalitat de cotutela: Universitat Politècnica de Catalunya i Aalborg UniversitetBusiness Intelligence tools support making better business decisions by analyzing available organizational data. Data Warehouses (DWs), typically structured with the Multidimensional (MD) model, are used to store data from different internal and external sources processed using Extract-Transformation-Load (ETL) processes. On-Line analytical Processing (OLAP) queries are applied on DWs to derive important business-critical knowledge. DW and OLAP technologies perform efficiently when they are applied on data that are static in nature and well organized in structure. Nowadays, Semantic Web technologies and the Linked Data principles inspire organizations to publish their semantic data, which allow machines to understand the meaning of data, using the Resource Description Framework (RDF) model. In addition to traditional (non-semantic) data sources, the incorporation of semantic data sources into a DW raises the additional challenges of schema derivation, semantic heterogeneity, and schema and data management model over traditional ETL tools. Furthermore, most SW data provided by business, academic and governmental organizations include facts and figures, which raise new requirements for BI tools to enable OLAP-like analyses over those semantic (RDF) data. In this thesis, we 1) propose a layer-based ETL framework for handling diverse semantic and non-semantic data sources by addressing the challenges mentioned above, 2) propose a set of high-level ETL constructs for processing semantic data, 3) implement appropriate environments (both programmable and GUI) to facilitate ETL processes and evaluate the proposed solutions. Our ETL framework is a semantic ETL framework because it integrates data semantically. We propose SETL, a unified framework for semantic ETL. The framework is divided into three layers: the Definition Layer, ETL Layer, and Data Warehouse Layer. In the Definition Layer, the semantic DW (SDW) schema, sources, and the mappings among the sources and the target are defined. In the ETL Layer, ETL processes to populate the SDW from sources are designed. The Data Warehouse Layer manages the storage of transformed semantic data. The framework supports the inclusion of semantic (RDF) data in DWs in addition to relational data. It allows users to define an ontology of a DW and annotate it with MD constructs (such as dimensions, cubes, levels, etc.) using the Data Cube for OLAP (QB4OLAP) vocabulary. It supports traditional transformation operations and provides a method to generate semantic data from the source data according to the semantics encoded in the ontology. It also provides a method to connect internal SDW data with external knowledge bases. On top of SETL, we propose SETLCONSTUCT where we define a set of high-level ETL tasks/operations to process semantic data sources. We divide the integration process into two layers: the Definition Layer and Execution Layer. The Definition Layer includes two tasks that allow DW designers to define target (SDW) schemas and the mappings between (intermediate) sources and the (intermediate) target. To create mappings among the sources and target constructs, we provide a mapping vocabulary called S2TMAP. Different from other ETL tools, we propose a new paradigm: we characterize the ETL flow transformations at the Definition Layer instead of independently within each ETL operation (in the Execution Layer). This way, the designer has an overall view of the process, which generates metadata (the mapping file) that the ETL operators will read and parametrize themselves with automatically. In the Execution Layer, we propose a set of high-level ETL operations to process semantic data sources. Finally, we develop a GUI-based semantic BI system SETLBI to define, process, integrate, and query semantic and non-semantic data. In addition to the Definition Layer and the ETL Layer, SETLBI has the OLAP Layer, which provides an interactive interface to enable OLAP analysis over the semantic DWLes eines d’Intel·ligència Empresarial (BI), conegudes en anglès com Business Intelligence, donen suport a la millora de la presa de decisions empresarials mitjançant l’anàlisi de les dades de l’organització disponibles. Els magatzems de dades, o data warehouse, (DWs), típicament estructurats seguint el model Multidimensional (MD), s’utilitzen per emmagatzemar dades de diferents fonts, tant internes com externes, processades mitjançant processos Extract- Transformation-Load (ETL). Les consultes de processament analític en línia (OLAP) s’apliquen als DW per extraure coneixement crític en l’àmbit empresarial. Els DW i les tecnologies OLAP funcionen de manera eficient quan s’apliquen sobre dades de natura estàtica i ben estructurades. Avui en dia, les tecnologies de la Web Semàntica (SW) i els principis Linked Data (LD) inspiren les organitzacions per publicar les seves dades en formats semàntics, que permeten que les màquines entenguin el significat de les dades, mitjançant el llenguatge de descripció de recursos (RDF). Una de les raons per les quals les dades semàntiques han tingut tant d’èxit és que es poden gestionar i fer que estiguin disponibles per tercers amb poc esforç, i no depenen d’esquemes de dades sofisticats. A més de les fonts de dades tradicionals (no semàntiques), la incorporació de fonts de dades semàntiques en un DW planteja reptes addicionals tals com derivar-hi esquema, l’heterogeneïtat semàntica i la representació de l’esquema i les dades a través d’eines d’ETL. A més, la majoria de dades SW proporcionades per empreses, organitzacions acadèmiques o governamentals inclouen fets i figures que representen nous reptes per les eines de BI per tal d’habilitar l’anàlisi OLAP sobre dades semàntiques (RDF). En aquesta tesi, 1) proposem un marc ETL basat en capes per a la gestió de diverses fonts de dades semàntiques i no semàntiques i adreçant els reptes esmentats anteriorment, 2) proposem un conjunt d’operacions ETL per processar dades semàntiques, i 3) la creació d’entorns apropiats de desenvolupament (programàtics i GUIs) per facilitar la creació i gestió de DW i processos ETL semàntics, així com avaluar les solucions proposades. El nostre marc ETL és un marc ETL semàntic perquè Es capaç de considerar e integrar dades de forma semàntica. Els següents paràgrafs elaboren sobre aquests contribucions. Proposem SETL, un marc unificat per a ETL semàntic. El marc es divideix en tres capes: la capa de definició, la capa ETL i la capa DW. A la capa de definició, es defineixen l’esquema del DW semàntic (SDW), les fonts i els mappings entre les fonts i l’esquema del DW. A la capa ETL, es dissenyen processos ETL per popular el SDW a partir de fonts. A la capa DW, es gestiona l’emmagatzematge de les dades semàntiques transformades. El nostre marc dóna suport a la inclusió de dades semàntiques (RDF) en DWs, a més de dades relacionals. Així, permet als usuaris definir una ontologia d’un DW i anotar-la amb construccions MD (com ara dimensions, cubs, nivells, etc.) utilitzant el vocabulari Data Cube for OLAP (QB4OLAP). També admet operacions de transformació tradicionals i proporciona un mètode per generar semàntica de les dades d’origen segons la semàntica codificada al document ontologia. També proporciona un mètode per connectar l’SDW amb bases de coneixement externes. Per tant, crea una base de coneixement, composta per un ontologia i les seves instàncies, on les dades estan connectades semànticament amb altres dades externes / internes. Per fer-ho, desenvolupem un mètode programàtic, basat en Python, d’alt nivell, per realitzar les tasques esmentades anteriorment. S’ha portat a terme un experiment complet d’avaluació comparant SETL amb una solució elaborada amb eines tradicional (que requereixen molta més codificació). Com a cas d’ús, hem emprat el Danish Agricultural dataset, i els resultats mostren que SETL proporciona un millor rendiment, millora la productivitat del programador i la qualitat de la base de coneixement. La comparació entre SETL i Pentaho Data Integration (PDI) mostra que SETL és un 13,5% més ràpid que PDI. A més de ser més ràpid que PDI, tracta les dades semàntiques com a ciutadans de primera classe, mentre que PDI no conté operadors específics per a dades semàntiques. A sobre de SETL, proposem SETLCONSTUCT on definim un conjunt de tasques d’alt nivell / operacions ETL per processar fonts de dades semàntiques i orientades a encapsular i facilitar la creació de l’ETL semàntic. Dividim el procés d’integració en dues capes: la capa de definició i la capa d’execució. La capa de definició inclou dues tasques que permeten definir als dissenyadors de DW esquemes destí (SDW) i mappings entre fonts (o resultats intermedis) i l’SDW (potencialment, altres resultats intermedis). Per crear mappings entre les fonts i el SDW, proporcionem un vocabulari de mapping anomenat Source-To-Target Mapping (S2TMAP). A diferència d’altres eines ETL, proposem un nou paradigma: les transformacions del flux ETL es caracteritzen a la capa de definició, i no de forma independent dins de cada operació ETL (a la capa d’execució). Aquest nou paradigma permet al dissenyador tenir una visió global del procés, que genera metadades (el fitxer de mapping) que els operadors ETL individuals llegiran i es parametritzaran automàticament. A la capa d’execució proposem un conjunt d’operacions ETL d’alt nivell per processar fonts de dades semàntiques. A més de la neteja, la unió i la transformació per dades semàntiques, proposem operacions per generar semàntica multidimensional i actualitzar el SDW per reflectir els canvis en les fonts. A més, ampliem SETLCONSTRUCT per permetre la generació automàtica de flux d’execució ETL (l’anomenem SETLAUTO). Finalment, proporcionem una àmplia avaluació per comparar la productivitat, el temps de desenvolupament i el rendiment de SETLCONSTRUCT i SETLAUTO amb el marc anterior SETL. L’avaluació demostra que SETLCONSTRUCT millora considerablement sobre SETL en termes de productivitat, temps de desenvolupament i rendiment. L’avaluació mostra que 1) SETLCONSTRUCT utilitza un 92% menys de caràcters mecanografiats (NOTC) que SETL, i SETLAUTO redueix encara més el nombre de conceptes usats (NOUC) un altre 25%; 2) utilitzant SETLCONSTRUCT, el temps de desenvolupament es redueix gairebé a la meitat en comparació amb SETL, i es redueix un altre 27 % mitjançant SETLAUTO; 3) SETLCONSTRUCT es escalable i té un rendiment similar en comparació amb SETL. Finalment, desenvolupem un sistema de BI semàntic basat en GUI SETLBI per definir, processar, integrar i consultar dades semàntiques i no semàntiques. A més de la capa de definició i de la capa ETL, SETLBI té una capa OLAP, que proporciona una interfície interactiva per permetre l’anàlisi OLAP d’autoservei sobre el DW semàntic. Cada capa està composada per un conjunt d’operacions / tasques. Per formalitzar les connexions intra i inter-capes dels components de cada capa, emprem una ontologia. La capa ETL amplia l’execució de la capa de SETLCONSTUCT afegint operacions per processar fonts de dades no semàntiques. Per últim, demostrem el sistema final mitjançant el cens de la població de Bangladesh (2011). La solució final d’aquesta tesi és l’eina SETLBI . SETLBI facilita (1) als dissenyadors del DW amb pocs / sense coneixements de SW, integrar semànticament les dades (semàntiques o no) i analitzar-les emprant OLAP, i (2) als usuaris de la SW els permet definir vistes sobre dades semàntiques, integrar-les amb fonts no semàntiques, i visualitzar-les segons el model MD i fer anàlisi OLAP. A més, els usuaris SW poden enriquir l’esquema SDW generat amb construccions RDFS / OWL. Prenent aquest marc com a punt de partida, els investigadors poden emprar-lo per a crear SDWs de forma interactiva i automàtica. Aquest projecte crea un pont entre les tecnologies BI i SW, i obre la porta a altres oportunitats de recerca com desenvolupar tècniques de DW i ETL comprensibles per les màquines.(Danskere) Business Intelligence (BI) værktøjer understøtter at tage bedre forretningsbeslutninger, ved at analysere tilgængelige organisatoriske data. Data Warehouses (DWs), typisk konstrueret med den Multidimensionelle (MD) model, bruges til at lagre data fra forskellige interne og eksterne kilder, der behandles ved hjælp af Extract-Transformation-Load (ETL) processer. On-Line Analytical Processing (OLAP) forespørgsler anvendes på DWs for at udlede vigtig forretningskritisk viden. DW og OLAP-teknologier fungerer effektivt, når de anvendes på data, som er statiske af natur og velorganiseret i struktur. I dag inspirerer Semantic Web (SW) teknologier og Linked Data (LD) principper organisationer til at offentliggøre deres semantiske data, som tillader maskiner at forstå betydningen af denne, ved hjælp af Resource Description Framework (RDF) modellen. En af grundene til, at semantiske data er blevet succesfuldt, er at styringen og udgivelsen af af dataene er nemt, og ikke er afhængigt af et sofistikeret skema. Ud over problemer ved overførslen af traditionelle (ikke-semantiske) databaser til DWs, opstår yderligere udfordringer ved overførslen af semantiske databaser, såsom skema nedarvning, semantisk heterogenitet samt skemaet for data repræsentation over traditionelle ETL værktøjer. På den anden side udgør en stor del af den semantiske data der bliver offentliggjort af virksomheder, akademikere samt regeringer, af figurer og fakta, der igen giver nye problemstillinger og krav til BI værktøjer, for at gøre OLAP lignende analyser over de semantiske data mulige. I denne afhandling gør vi følgende: 1) foreslår et lag-baseret ETL framework til at håndterer multiple semantiske og ikke-semantiske datakilder, ved at svare på udfordringerne nævnt herover, 2) foreslår en mængde af ETL operationer til at behandle semantisk data, 3) implementerer passende miljøer (både programmerbare samt grafiske brugergrænseflader), for at lette ETL processer og evaluere den foreslåede løsning. Vores ETL framework er et semantisk ETL framework, fordi det integrerer data semantisk. Den følgende sektion forklarer vores bidrag. Vi foreslår SETL, et samlet framework for semantisk ETL. Frameworket er splittet i tre lag: et definitions-lag, et ETL-lag, og et DW-lag. Det semanvii tiske DW (SWD) skema, datakilder, samt sammenhængen mellem datakilder og deres mål, er defineret i definitions-laget. I ETL-laget designes ETLprocesser til at udfylde SDW fra datakilderne. DW-laget administrerer lagring af transformerede semantiske data. Frameworket understøtter inkluderingen af semantiske (RDF) data i DWs ud over relationelle data. Det giver brugerne mulighed for at definere en ontologi for et DW og annotere med MD-konstruktioner (såsom dimensioner, kuber, niveauer osv.) ved hjælp af Data Cube til OLAP (QB4OLAP) ordforrådet. Det understøtter traditionelle transformations operationer, og giver en metode til at generere semantiske data fra de oprindelige data, i henhold til semantikken indkodet i ontologien. Det muliggør også en metode til at forbinde interne SDW data med eksterne vidensbaser. Herved skaber det en vidensbase, der er sammensat af en ontologi og dets instanser, hvor data er semantisk forbundet med andre eksterne / interne data. Vi udvikler et høj niveau Python-baseret programmerbart framework for at udføre de ovennævnte opgaver. En omfattende eksperimentel evaluering, der sammenligner SETL med en traditionel løsning (hvilket krævede meget manuel kodning), om brugen af danske landbrugsog forretnings datasæt, viser at SETL præsterer bedre, programmør produktivitet og vidensbase kvalitet. Sammenligningen mellem SETL og Pentaho Data Integration (PDI) ved behandling af en semantisk kilde viser, at SETL er 13,5% hurtigere end PDI. Udover SETL, foreslår vi SETLCONSTRUCT hvor vi definerer et sæt ETLoperationer på højt niveau til behandling af semantiske datakilder. Vi deler integrationsprocessen i to lag: Definitions-lag og eksekverings-lag. Definitionslaget indeholder to opgaver, der giver DW designere muligheden for at definere (SDW) skemaer, og kortlægningerne mellem kilder og målet. For at oprette kortlægning mellem kilderne og målene, leverer vi et kortlægnings ordforråd kaldet Source-to-Target Mapping (S2TMAP). Forskelligt fra andre ETL-værktøjer foreslår vi et nyt paradigme: vi karakteriserer ETLflowtransformationerne i definitions-laget i stedet for uafhængigt inden for hver ETL-operation (i eksekverings-laget). På denne måde har designeren et overblik over processen, som genererer metadata (kortlægningsfilen), som ETL operatørerne vil læse og parametrisere automatisk. I eksekverings-laget foreslår vi en mængde høj niveau ETL-operationer til at behandle semantiske datakilder. Udover rensning, sammenføjning og datatypebaseret transformationer af semantiske data, foreslår vi operationer til at generere multidimensionel semantik på data-niveau og operationer til at opdatere et SDW for at afspejle ændringer i kilde-dataen. Derudover udvider vi SETLCONSTRUCT for at muliggøre automatisk ETL-eksekveringsstrømgenerering (vi kalder det SETLAUTO). Endelig leverer vi en omfattende evaluering for at sammenligne produktivitet, udviklingstid og ydeevne for scon og SETLAUTO med den tidligere ramme SETL. Evalueringen viser, at SETLCONSTRUCT forbedres markant i forhold til SETL med hensyn til produktivitet, udviklingstid og ydeevne. Evalueringen viser, at 1) SETLCONSTRUCT bruger 92% færre antal indtastede tegn (NOTC) end SETL, og SETLAUTO reducerer antallet af brugte begreber (NOUC) yderligere med 25%; 2) ved at bruge SETLCONSTRUCT, er udviklingstiden næsten halveret sammenlignet med SETL, og skæres med yderligere 27% ved hjælp af SETLAUTO; 3) SETLCONSTRUCT er skalerbar og har lignende ydelse sammenlignet med SETL. Til slut udvikler vi et GUI-baseret semantisk BI system SETLBI for at definere, processere, integrere og lave forespørgsler på semantiske og ikkesemantiske data. Ud over definitions-laget og ETL-laget, har SETLBI et OLAP-lag, som giver en interaktiv grænseflade for at muliggøre selvbetjenings OLAP analyser over det semantiske DW. Hvert lag er sammensat af en mængde operationer/opgaver. Vi udarbejder en ontologi til at formalisere intra-og ekstra-lags forbindelserne mellem komponenterne og lagene. ETLlaget udvider eksekverings-laget af SETLCONSTUCT ved at tilføje operationer til at behandle ikke-semantiske datakilder. Vi demonstrerer systemet ved hjælp af Bangladesh population census 2011 datasættet. Sammenfatningen af denne afhandling er BI-værktøjet SETLBI . SETLBI fremmer (1) DW-designere med ringe / ingen SW-viden til semantisk at integrere semantiske og / eller ikke-semantiske data og analysere det i OLAP stil, og (2) SW brugere med grundlæggende MD-baggrund til at definere MDvisninger over semantiske data, der aktiverer OLAP-lignende analyse. Derudover kan SW-brugere berige det genererede SDW-skema med RDFS / OWLkonstruktioner. Med udgangspunkt i frameworket som et grundlag kan forskere sigte mod at udvikle yderligere interaktive og automatiske integrationsrammer for SDW. Dette projekt bygger bro mellem de traditionelle BIteknologier og SW-teknologier, som igen vil åbne døren for yderligere forskningsmuligheder som at udvikle maskinforståelige ETL og lagerteknikker.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Graph BI & analytics: current state and future challenges

Author: Ghrab Amine
Jouili Salim
Romero Moral Óscar
Skhiri Sabri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In an increasingly competitive market, making well-informed decisions requires the analysis of a wide range of heterogeneous, large and complex data. This paper focuses on the emerging field of graph warehousing. Graphs are widespread structures that yield a great expressive power. They are used for modeling highly complex and interconnected domains, and efficiently solving emerging big data application. This paper presents the current status and open challenges of graph BI and analytics, and motivates the need for new warehousing frameworks aware of the topological nature of graphs. We survey the topics of graph modeling, management, processing and analysis in graph warehouses. Then we conclude by discussing future research directions and positioning them within a unified architecture of a graph BI and analytics framework.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC