342 research outputs found

    Semantic Query Reasoning in Distributed Environment

    Get PDF
    Master's thesis in Computer scienceSemantic Web aims to elevate simple data in WWW to semantic layer, so that knowledge, processed by machine, can be shared more easily. Ontology is one of the key technologies to realize Semantic Web. Semantic reasoning is an important step in Semantic technology. For Ontology developers, semantic reasoning finds out collisions in Ontology definition, and optimizes it; for Ontology users, semantic reasoning retrieves implicit knowledge from known knowledge. The main research of this thesis is reasoning of semantic data querying in distributed environment, which tries to get correct results of semantic data querying, given Ontology definition and data. This research studied two methods: data materialization and query rewriting. Using Amazon cloud computing service and LUBM, we compared these two methods, and have concluded that when size of data to be queried scales up, query rewriting is more feasible than data materialization. Also, based on the conclusion, we developed an application, which manages and queries semantic data in a distributed environment. This application can be used as a prototype of similar applications, and a tool for other Semantic Web researches as well

    Making Linked Open Data Writable with Provenance Semirings

    Get PDF
    Linked Open Data cloud (LOD) is essentially read-only, re- straining the possibility of collaborative knowledge construction. To sup- port collaboration, we need to make the LOD writable. In this paper, we propose a vision for a writable linked data where each LOD participant can define updatable materialized views from data hosted by other par- ticipants. Consequently, building a writable LOD can be reduced to the problem of SPARQL self-maintenance of Select-Union recursive mate- rialized views. We propose TM-Graph, an RDF-Graph annotated with elements of a specialized provenance semiring to maintain consistency of these views and we analyze complexity in space and traffic

    Storing and querying evolving knowledge graphs on the web

    Get PDF

    Metadata-driven data integration

    Get PDF
    Cotutela: Universitat Politècnica de Catalunya i Université Libre de Bruxelles, IT4BI-DC programme for the joint Ph.D. degree in computer science.Data has an undoubtable impact on society. Storing and processing large amounts of available data is currently one of the key success factors for an organization. Nonetheless, we are recently witnessing a change represented by huge and heterogeneous amounts of data. Indeed, 90% of the data in the world has been generated in the last two years. Thus, in order to carry on these data exploitation tasks, organizations must first perform data integration combining data from multiple sources to yield a unified view over them. Yet, the integration of massive and heterogeneous amounts of data requires revisiting the traditional integration assumptions to cope with the new requirements posed by such data-intensive settings. This PhD thesis aims to provide a novel framework for data integration in the context of data-intensive ecosystems, which entails dealing with vast amounts of heterogeneous data, from multiple sources and in their original format. To this end, we advocate for an integration process consisting of sequential activities governed by a semantic layer, implemented via a shared repository of metadata. From an stewardship perspective, this activities are the deployment of a data integration architecture, followed by the population of such shared metadata. From a data consumption perspective, the activities are virtual and materialized data integration, the former an exploratory task and the latter a consolidation one. Following the proposed framework, we focus on providing contributions to each of the four activities. We begin proposing a software reference architecture for semantic-aware data-intensive systems. Such architecture serves as a blueprint to deploy a stack of systems, its core being the metadata repository. Next, we propose a graph-based metadata model as formalism for metadata management. We focus on supporting schema and data source evolution, a predominant factor on the heterogeneous sources at hand. For virtual integration, we propose query rewriting algorithms that rely on the previously proposed metadata model. We additionally consider semantic heterogeneities in the data sources, which the proposed algorithms are capable of automatically resolving. Finally, the thesis focuses on the materialized integration activity, and to this end, proposes a method to select intermediate results to materialize in data-intensive flows. Overall, the results of this thesis serve as contribution to the field of data integration in contemporary data-intensive ecosystems.Les dades tenen un impacte indubtable en la societat. La capacitat d’emmagatzemar i processar grans quantitats de dades disponibles és avui en dia un dels factors claus per l’èxit d’una organització. No obstant, avui en dia estem presenciant un canvi representat per grans volums de dades heterogenis. En efecte, el 90% de les dades mundials han sigut generades en els últims dos anys. Per tal de dur a terme aquestes tasques d’explotació de dades, les organitzacions primer han de realitzar una integració de les dades, combinantles a partir de diferents fonts amb l’objectiu de tenir-ne una vista unificada d’elles. Per això, aquest fet requereix reconsiderar les assumpcions tradicionals en integració amb l’objectiu de lidiar amb els requisits imposats per aquests sistemes de tractament massiu de dades. Aquesta tesi doctoral té com a objectiu proporcional un nou marc de treball per a la integració de dades en el context de sistemes de tractament massiu de dades, el qual implica lidiar amb una gran quantitat de dades heterogènies, provinents de múltiples fonts i en el seu format original. Per això, proposem un procés d’integració compost d’una seqüència d’activitats governades per una capa semàntica, la qual és implementada a partir d’un repositori de metadades compartides. Des d’una perspectiva d’administració, aquestes activitats són el desplegament d’una arquitectura d’integració de dades, seguit per la inserció d’aquestes metadades compartides. Des d’una perspectiva de consum de dades, les activitats són la integració virtual i materialització de les dades, la primera sent una tasca exploratòria i la segona una de consolidació. Seguint el marc de treball proposat, ens centrem en proporcionar contribucions a cada una de les quatre activitats. La tesi inicia proposant una arquitectura de referència de software per a sistemes de tractament massiu de dades amb coneixement semàntic. Aquesta arquitectura serveix com a planell per a desplegar un conjunt de sistemes, sent el repositori de metadades al seu nucli. Posteriorment, proposem un model basat en grafs per a la gestió de metadades. Concretament, ens centrem en donar suport a l’evolució d’esquemes i fonts de dades, un dels factors predominants en les fonts de dades heterogènies considerades. Per a l’integració virtual, proposem algorismes de rescriptura de consultes que usen el model de metadades previament proposat. Com a afegitó, considerem heterogeneïtat semàntica en les fonts de dades, les quals els algorismes de rescriptura poden resoldre automàticament. Finalment, la tesi es centra en l’activitat d’integració materialitzada. Per això proposa un mètode per a seleccionar els resultats intermedis a materialitzar un fluxes de tractament intensiu de dades. En general, els resultats d’aquesta tesi serveixen com a contribució al camp d’integració de dades en els ecosistemes de tractament massiu de dades contemporanisLes données ont un impact indéniable sur la société. Le stockage et le traitement de grandes quantités de données disponibles constituent actuellement l’un des facteurs clés de succès d’une entreprise. Néanmoins, nous assistons récemment à un changement représenté par des quantités de données massives et hétérogènes. En effet, 90% des données dans le monde ont été générées au cours des deux dernières années. Ainsi, pour mener à bien ces tâches d’exploitation des données, les organisations doivent d’abord réaliser une intégration des données en combinant des données provenant de sources multiples pour obtenir une vue unifiée de ces dernières. Cependant, l’intégration de quantités de données massives et hétérogènes nécessite de revoir les hypothèses d’intégration traditionnelles afin de faire face aux nouvelles exigences posées par les systèmes de gestion de données massives. Cette thèse de doctorat a pour objectif de fournir un nouveau cadre pour l’intégration de données dans le contexte d’écosystèmes à forte intensité de données, ce qui implique de traiter de grandes quantités de données hétérogènes, provenant de sources multiples et dans leur format d’origine. À cette fin, nous préconisons un processus d’intégration constitué d’activités séquentielles régies par une couche sémantique, mise en oeuvre via un dépôt partagé de métadonnées. Du point de vue de la gestion, ces activités consistent à déployer une architecture d’intégration de données, suivies de la population de métadonnées partagées. Du point de vue de la consommation de données, les activités sont l’intégration de données virtuelle et matérialisée, la première étant une tâche exploratoire et la seconde, une tâche de consolidation. Conformément au cadre proposé, nous nous attachons à fournir des contributions à chacune des quatre activités. Nous commençons par proposer une architecture logicielle de référence pour les systèmes de gestion de données massives et à connaissance sémantique. Une telle architecture consiste en un schéma directeur pour le déploiement d’une pile de systèmes, le dépôt de métadonnées étant son composant principal. Ensuite, nous proposons un modèle de métadonnées basé sur des graphes comme formalisme pour la gestion des métadonnées. Nous mettons l’accent sur la prise en charge de l’évolution des schémas et des sources de données, facteur prédominant des sources hétérogènes sous-jacentes. Pour l’intégration virtuelle, nous proposons des algorithmes de réécriture de requêtes qui s’appuient sur le modèle de métadonnées proposé précédemment. Nous considérons en outre les hétérogénéités sémantiques dans les sources de données, que les algorithmes proposés sont capables de résoudre automatiquement. Enfin, la thèse se concentre sur l’activité d’intégration matérialisée et propose à cette fin une méthode de sélection de résultats intermédiaires à matérialiser dans des flux des données massives. Dans l’ensemble, les résultats de cette thèse constituent une contribution au domaine de l’intégration des données dans les écosystèmes contemporains de gestion de données massivesPostprint (published version

    Performance assessment of RDF graph databases for smart city services

    Get PDF
    Abstract Smart cities are providing advanced services aggregating and exploiting data from different sources. Cities collect static data such as road graphs, service description, as well as dynamic/real time data like weather forecast, traffic sensors, bus positions, city sensors, events, emergency data, flows, etc. RDF stores may be used to set up knowledge bases integrating heterogeneous information for web and mobile applications to use the data for new advanced services to citizens and city administrators, thus exploiting inferential capabilities, temporal and spatial reasoning, and text indexing. In this paper, the needs and constraints for RDF stores to be used for smart cities services, together with the currently available RDF stores are evaluated. The assessment model allows a full understanding of whether an RDF store is suitable to be used as a basis for Smart City modeling and applications. The RDF assessment model is also supported by a benchmark which extends available RDF store benchmarks at the state the art. The comparison of the RDF stores has been applied on a number of well-known RDF stores as Virtuoso, GraphDB (former OWLIM), Oracle, StarDog, and many others. The paper also reports the adoption of the proposed Smart City RDF Benchmark on the basis of Florence Smart City model, data sets and tools accessible as Km4City Http://www.Km4City.org , and adopted in the European Commission international smart city projects named RESOLUTE H2020, REPLICATE H2020, and in Sii-Mobility National Smart City project in Italy

    Web ontology reasoning with logic databases [online]

    Get PDF

    DataHub: Collaborative Data Science & Dataset Version Management at Scale

    Get PDF
    Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

    EIS: using the metadatabase approach for data integration and OLAP.

    Get PDF
    by Ho Kwok-Wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 121-126).Abstract also in Chinese.ABSTRACT --- p.IITABLE OF CONTENTS --- p.VLIST OF FIGURES --- p.XACKNOWLEDGMENTS --- p.XIIChapter CHAPTER 1 --- INTRODUCTION --- p.1Chapter 1.1 --- Need support in data integration --- p.2Chapter 1.2 --- Need support in On-line Analytical Processing (OLAP) --- p.4Chapter 1.3 --- The proposed research --- p.5Chapter 1.4 --- Scope of the study --- p.6Chapter 1.5 --- Organization of the Thesis --- p.7Chapter CHAPTER 2 --- LITERATURE REVIEW --- p.8Chapter 2.1 --- Executive Information System (EIS) --- p.9Chapter 2.1.1 --- Definition --- p.9Chapter 2.1.2 --- Goals of Executive Information System --- p.10Chapter 2.1.3 --- Role of Executive Information System --- p.11Chapter 2.1.4 --- General characteristics of Executive Information System --- p.12Chapter 2.1.4.1 --- A separate executive database --- p.12Chapter 2.1.4.2 --- Data aggregation facilities --- p.12Chapter 2.1.4.3 --- Drill-Down (and Roll-Up) --- p.13Chapter 2.1.4.4 --- Trend analysis --- p.13Chapter 2.1.4.5 --- Highly user-friendly interfaceChapter 2.1.4.6 --- Flexible menu-based data retrieval --- p.14Chapter 2.1.4.7 --- High quality of business graphics --- p.14Chapter 2.1.4.8 --- Simple modeling facilities --- p.15Chapter 2.1.4.9 --- Communications --- p.15Chapter 2.1.4.10 --- Automated links to other databases --- p.15Chapter 2.1.4.11 --- Briefing book --- p.16Chapter 2.1.5 --- Architecture of Executive Information System --- p.16Chapter 2.1.6 --- Potential problems of Executive Information System --- p.18Chapter 2.2 --- On-line Analytical Processing (OLAP) --- p.20Chapter 2.2.1 --- Limitations of OLAP --- p.21Chapter 2.2.2 --- Integration of heterogeneous distributed systems and databases --- p.21Chapter 2.3 --- Data Warehousing (DW) --- p.23Chapter 2.3.1 --- Definition --- p.24Chapter 2.3.1.1 --- Subject-Orientation --- p.24Chapter 2.3.1.2 --- Integration --- p.25Chapter 2.3.1.3 --- Time Variancy --- p.26Chapter 2.3.1.4 --- Nonvolatile --- p.27Chapter 2.3.2 --- Goal of Data Warehousing --- p.28Chapter 2.3.3 --- Architecture of Data Warehousing --- p.28Chapter 2.3.3.1 --- Integrator --- p.29Chapter 2.3.3.2 --- Monitor --- p.30Chapter 2.3.3.3 --- Data Warehouse --- p.31Chapter 2.3.4 --- Application in EIS --- p.31Chapter 2.3.5 --- Problems associated with Data Warehouse --- p.33Chapter 2.4 --- The Metadatabase Approach --- p.35Chapter 2.4.1 --- Goals of the Metadatabase Approach --- p.36Chapter 2.4.2 --- Structure of the Metadatabase Approach --- p.37Chapter 2.4.3 --- Metadatabase Approach functionalities --- p.40Chapter 2.4.4 --- TSER Modeling Technique --- p.42Chapter 2.4.4.1 --- The Functional Model --- p.43Chapter 2.4.4.1.1 --- Subject --- p.43Chapter 2.4.4.1.2 --- Context --- p.43Chapter 2.4.4.2 --- The Structural Model --- p.44Chapter 2.4.4.2.1 --- Entity --- p.44Chapter 2.4.4.2.2 --- Plural Relationship (PR) --- p.45Chapter 2.4.4.2.3 --- Functional Relationship (FR) --- p.45Chapter 2.4.4.2.4 --- Mandatory Relationship (MR) --- p.45Chapter 2.4.4.3 --- Metadatabase Repository --- p.46Chapter CHAPTER 3 --- RESEARCH METHODOLOGY --- p.48Chapter 3.1 --- Literature review --- p.49Chapter 3.2 --- Architecture construction --- p.50Chapter 3.3 --- Algorithm and methods development --- p.50Chapter 3.4 --- Prototyping --- p.51Chapter 3.5 --- Analysis and evaluation --- p.51Chapter CHAPTER 4 --- MULTIDIMENSIONAL DATA ANALYSIS --- p.53Chapter 4.1 --- Multidimensional Analysis Unit (MAU) --- p.54Chapter 4.2 --- New steps for multidimensional data analysis --- p.57Step 1 Indicator Selection --- p.57Step 2 Dimensions Determination --- p.58Step 3 Dimensions Selection --- p.58Step 4 MAU Sub-view Materialization --- p.59Step 5 On-line Analytical Processing (OLAP) --- p.59Chapter CHAPTER 5 --- NEW ARCHITECTURE FOR EXECUTIVE INFORMATION SYSTEM --- p.60Chapter 5.1 --- Evolution of EIS architecture --- p.60Chapter 5.2 --- Objectives of the new EIS architecture --- p.63Chapter 5.3 --- The new EIS architecture --- p.65Chapter 5.3.1 --- The Metadatabase Management System (MDBMS) --- p.67Chapter 5.3.2 --- The ROLAP/MDB Interface --- p.68Chapter 5.3.2.1 --- The Indicator Browser --- p.69Chapter 5.3.2.2 --- The Dimension Selector --- p.70Chapter 5.3.2.3 --- The Multidimensional Data Analyzer --- p.70Chapter 5.3.3 --- The ROLAP/MDB Analyzer --- p.71Chapter 5.3.3.1 --- The Dimension Determination Module --- p.71Chapter 5.3.3.2 --- The MAU Schema Saver --- p.72Chapter 5.3.3.3 --- The MQL Generator --- p.72Chapter 5.3.3.4 --- The MAU Sub-view Materializer --- p.72Chapter 5.3.3.5 --- The ROLAP/MDB Processor --- p.73Chapter CHAPTER 6 --- ALGORITHM AND METHODS FOR THE NEW EIS ARCHITECTURE.… --- p.74Chapter 6.1 --- Indicator Browser --- p.74Chapter 6.2 --- Determining dimensions and storing MAU Schema --- p.77Chapter 6.3 --- Dimensions selection --- p.82Chapter 6.4 --- Materialize MAU Sub-view --- p.82Chapter 6.5 --- Multidimensional data analysis in relational manner --- p.85Chapter 6.5.1 --- SQL statements for three dimensional slide operation --- p.87Chapter 6.5.2 --- SQL statements for n-dimensional slide operation --- p.89Chapter 6.5.3 --- SQL statements for n-dimensional dice operation --- p.91Chapter 6.5.4 --- Rotation --- p.92Chapter 6.5.5 --- Drill-Down (and Roll-Up) --- p.94Chapter CHAPTER 7 --- A CASE STUDY USING THE PROTOTYPED EIS --- p.97Chapter 7.1 --- A Business Case --- p.97Chapter 7.2 --- Multidimensional data analysis --- p.98Step 1 Indicator selection --- p.99Step 2 & 3 Dimension determination & MAU Schema storage --- p.100Step 4 Dimension specification --- p.102Step 5 MAU Sub-view formation --- p.104Step 6 Multidimensional data analysis operations --- p.104Chapter CHAPTER 8 --- EVALUATION OF THE NEW EIS ARCHITECTURE --- p.110Chapter 8.1 --- Improvements --- p.110Chapter 8.1.1 --- Adaptability --- p.111Chapter 8.1.2 --- Flexibility --- p.112Chapter 8.2 --- New features of the new EIS architecture --- p.113Chapter 8.2.1 --- Access on-line production data --- p.113Chapter 8.2.2 --- Facilitate data-mining --- p.114Chapter 8.3 --- Processing efficiency problem --- p.114Chapter 8.3.1 --- MAU Schema Saver for reusability --- p.115Chapter 8.3.2 --- Dimension Selector to scale down data retrieval --- p.116Chapter 8.3.3 --- MAU Sub-view materialization for reusability --- p.116Chapter 8.3.4 --- Incorporate data warehouse to reduce access to local systems --- p.117Chapter 8.4 --- Summary --- p.117Chapter CHAPTER 9 --- CONCLUSION --- p.118Chapter CHAPTER 10 --- DIRECTION OF FUTURE STUDIES --- p.120REFERENCES --- p.121APPENDIX --- p.127Global Information Resources Dictionary (GIRD) --- p.12
    corecore