29 research outputs found

    Semantic Web Based Relational Database Access With Conflict Resolution

    Get PDF
    This thesis focuses on (1) accessing relational databases through Semantic Web technologies and (2) resolving conflicts that usually arises when integrating data from heterogeneous source schemas and/or instances. In the first part of the thesis, we present an approach to access relational databases using Semantic Web technologies. Our approach is built on top of Ontop framework for Ontology Based Data Access. It extracts both Ontop mappings and an equivalent OWL ontology from an existing database schema. The end users can then access the underlying data source through SPARQL queries. The proposed approach takes into consideration the different relationships between the entities of the database schema when it extracts the mapping and the equivalent ontology. Instead of extracting a flat ontology that is an exact copy of the database schema, it extracts a rich ontology. The extracted ontology can also be used as an intermediary between a domain ontology and the underlying database schema. Our approach covers independent or master entities that do not have foreign references, dependent or detailed entities that have some foreign keys that reference other entities, recursive entities that contain some self references, binary join entities that relate two entities together, and n-ary join entities that map two or more entities in an n-ary relation. The implementation results indicate that the extracted Ontop mappings and ontology are accurate. i.e., end users can query all data (using SPARQL) from the underlying database source in the same way as if they have written SQL queries. In the second part, we present an overview of the conflict resolution approaches in both conventional data integration systems and collaborative data sharing communities. We focus on the latter as it supports the needs of scientific communities for data sharing and collaboration. We first introduce the purpose of the study, and present a brief overview of data integration. Next, we talk about the problem of inconsistent data in conventional integration systems, and we summarize the conflict handling strategies used to handle such inconsistent data. Then we focus on the problem of conflict resolution in collaborative data sharing communities. A collaborative data sharing community is a group of users who agree to share a common database instance, such that all users have access to the shared instance and they can add to, update, and extend this shared instance. We discuss related works that adopt different conflict resolution strategies in the area of collaborative data sharing, and we provide a comparison between them. We find that a Collaborative Data Sharing System (CDSS) can best support the needs of certain communities such as scientific communities. We then discuss some open research opportunities to improve the efficiency and performance of the CDSS. Finally, we summarize our work so far towards achieving these open research directions

    GeoTriples: Transforming geospatial data into RDF graphs using R2RML and RML mappings

    Get PDF
    A lot of geospatial data has become available at no charge in many countries recently. Geospatial data that is currently made available by government agencies usually do not follow the linked data paradigm. In the few cases where government agencies do follow the linked data paradigm (e.g., Ordnance Survey in the United Kingdom), specialized scripts have been used for transforming geospatial data into RDF. In this paper we present the open source tool GeoTriples which generates and processes extended R2RML and RML mappings that transform geospatial data from many input formats into RDF. GeoTriples allows the transformation of geospatial data stored in raw files (shapefiles, CSV, KML, XML, GML and GeoJSON) and spatially-enabled RDBMS (PostGIS and MonetDB) into RDF graphs using well-known vocabularies like GeoSPARQL and stSPARQL, but without being tightly coupled to a specific vocabulary. GeoTriples has been developed in European projects LEO and Melodies and has been used to transform many geospatial data sources into linked data. We study the performance of GeoTriples experimentally using large publicly available geospatial datasets, and show that GeoTriples is very efficient and scalable especially when its mapping processor is implemented using Apache Hadoop

    SEMANTIC LINKING SPATIAL RDF DATA TO THE WEB DATA SOURCES

    Get PDF
    Large amounts of spatial data are hold in relational databases. Spatial data in the relational databases must be converted to RDF for semantic web applications. Spatial data is an important key factor for creating spatial RDF data. Linked Data is the most preferred way by users to publish and share data in the relational databases on the Web. In order to define the semantics of the data, links are provided to vocabularies (ontologies or other external web resources) that are common conceptualizations for a domain. Linking data of resource vocabulary with globally published concepts of domain resources combines different data sources and datasets, makes data more understandable, discoverable and usable, improves data interoperability and integration, provides automatic reasoning and prevents data duplication. The need to convert relational data to RDF is coming in sight due to semantic expressiveness of Semantic Web Technologies. One of the important key factors of Semantic Web is ontologies. Ontology means “explicit specification of a conceptualization”. The semantics of spatial data relies on ontologies. Linking of spatial data from relational databases to the web data sources is not an easy task for sharing machine-readable interlinked data on the Web. Tim Berners-Lee, the inventor of the World Wide Web and the advocate of Semantic Web and Linked Data, layed down the Linked Data design principles. Based on these rules, firstly, spatial data in the relational databases must be converted to RDF with the use of supporting tools. Secondly, spatial RDF data must be linked to upper level-domain ontologies and related web data sources. Thirdly, external data sources (ontologies and web data sources) must be determined and spatial RDF data must be linked related data sources. Finally, spatial linked data must be published on the web. The main contribution of this study is to determine requirements for finding RDF links and put forward the deficiencies for creating or publishing linked spatial data. To achieve this objective, this study researches existing approaches, conversion tools and web data sources for relational data conversion to the spatial RDF. In this paper, we have investigated current state of spatial RDF data, standards, open source platforms (particularly D2RQ, Geometry2RDF, TripleGeo, GeoTriples, Ontop, etc.) and the Web Data Sources. Moreover, the process of spatial data conversion to the RDF and how to link it to the web data sources is described. The implementation of linking spatial RDF data to the web data sources is demonstrated with an example use case. Road data has been linked to the one of the related popular web data sources, DBPedia. SILK, a tool for discovering relationships between data items within different Linked Data sources, is used as a link discovery framework. Also, we evaluated other link discovery tools e.g. LIMES, Silk and results are compared to carry out matching/linking task. As a result, linked road data is shared and represented as an information resource on the web and enriched with definitions of related different resources. By this way, road datasets are also linked by the related classes, individuals, spatial relations and properties they cover such as, construction date, road length, coordinates, etc

    An Approach To Publish a Data Warehouse Content as Linked Data

    Get PDF
    Mestrado em Engenharia Informática - Área de Especialização em Tecnologias do Conhecimento e DecisãoOrganizations are still gathering huge amounts of data/information and storing them in data warehouses (DW) for reporting and data analysis purposes. Most of those DW rely on Relational Databases (RDB) management systems and are structured by a schema (e.g. star schema, snowflake schema, etc). On the other hand, with the advent of Semantic Web, organizations are being pushed to add semantics (i.e. metadata) on their own data in order to find, share, combine and reuse information more easily across applications, organizations and community boundaries. The goal of the Semantic Web is to provide the ability for computers to perform more complex jobs through principles of Linked Data. In that sense, the W3C proposes the adoption of standards like RDF, OWL and SPARQL technologies that help exposing and accessing the data and its semantics by using logical structures called Ontologies. Simply put, an ontology captures/represents the vocabulary and interpretation restrictions of a particular application domain (i.e. concepts, their relations and restrictions), which is further used to describe a set of specific data (instances) for that domain. In this context, the work described in this document is intended to explore and analyze (i) the Vocabulary recommended by W3C to describe a Data Cube represented in RDF and (ii) the languages of mapping relational database (RDB) to RDF, also recommend by W3C, in order to propose their application in a semi-automatic process that should allow, in a quick and easy manner, to publish semantically the content of a existing DW from relational database in accordance with the principles of Linked (Open) data. The semi-automatic process can save time/money in creating a data repository that has an ontology, which could be used as standard “facade” for the content of the Data Warehouse to be use on Semantic Web technologies. The semiautomatic process consists of four sub-processes (cf. chapter 6). The first process, called Setup and Configuration Process, select the tables of data warehouses (cf. chapter 2), from which it will extract the data. The second process, called RDF Data Cube Ontology Structure Definition Process, creates an ontology structure, without data, based on the results obtained in Setup and Configuration Process. The ontology also uses a vocabulary recommended by W3C, so it can be classified and used as a data cube (cf. chapter 5). The third process, called Mappings Specification Process, creates a mapping between the Data Warehouse and the ontology created, using a standard language recommended by the W3C called RDB2RDF R2RML. The last and fourth, called Mapping Execution, that creates the data to be used by the ontology by mapping generated by the Mappings Specification Process.As organizações estão constantemente a recolher enormes quantidades de dados / informações para guardarem em Armazéns de Dados para fins de elaboração de relatórios e análise de dados. A maioria desses Armazéns usa sistemas de gestão de bases de dados relacionais e são estruturadas de acordo com um esquema (e.g. o esquema em estrela, o esquema em floco de neve, etc.). Por outro lado, com o advento da Web Semântica, as organizações estão a ser pressionadas a adicionar semântica (isto é, meta dados) sobre os seus próprios dados, a fim de encontrar, partilhar, combinar e reutilizar informação mais facilmente entre aplicações, organizações e comunidades. O objetivo da Web Semântica é providenciar aos computadores capacidade de executar trabalhos mais complexos através de princípios de Linked Data (ver capitulo 3). Nesse sentido, a W3C tem proposto a adoção de várias recomendações como o RDF, o OWL e o SPARQL. Estas tecnologias ajudam a expor os dados e a sua semântica usando estruturas lógicas, denominadas de Ontologias. De forma simples, uma ontologia captura/representa o vocabulário e restrições de interpretação de um determinado domínio de aplicação (i.e. os conceitos, suas relações e restrições) que posteriormente é usado para descrever um conjunto de dados concretos desse domínio. Neste contexto, o trabalho descrito neste documento visa analisar e explorar (i) o Vocabulário recomendado pela W3C para descrever um Cubo de Dados representado em RDF (ver capitulo 5) e (ii) as linguagens de mapeamento de Dados Relacionais (RDB) para RDF (ver capitulo 4), também recomendadas pela W3C, com o intuito de propor a sua aplicação num processo semiautomático que permita publicar semanticamente de forma rápida e fácil o conteúdo de um Armazém de Dados existente numa base de dados relacional de acordo com os princípios de Linked (Open) Data. O objetivo do processo semiautomático é criar um repositório de dados com uma ontologia, que poderá ser usada como “fachada” standard para o conteúdo do Armazém de Dados para ser usado em tecnologias de Web Semântica. O processo semiautomático proposto é constituído por 4 subprocessos (ver capitulo 6). O primeiro processo, chamado Setup and Configuration Process (ver secção 6.2.2), visa selecionar e categorizar as tabelas do Armazéns de Dados (ver capitulo 2), do qual se irá extrair os dados. O segundo processo, chamado RDF Data Cube Ontology Structure Definition Process (ver secção 6.2.3), cria uma ontologia sem dados cuja estrutura advém tanto (i) do vocabulário recomendado pela W3C para descrição de Cubos de Dados (ver capítulo 5) e (ii) do resultado obtido no Setup and Configuration Process . O terceiro processo, chamado Mappings Specification Process (ver secção 6.2.4), cria um mapeamento entre o Armazém de Dados e a ontologia resultado do processo anterior. Este mapeamento assenta na recomendação da W3C denominado R2RML. O último e quarto processo, chamado Mapping Execution Process (ver secção 6.2.5), expõe os dados do Armazém de Dados de acordo com a ontologia anterior, através do mapeamento gerado pelo Mappings Specification Process. Esta tese está dividida em sete capítulos. O primeiro capítulo providencia uma introdução ao contexto e ao objetivo deste documento. O segundo capítulo apresenta uma visão geral sobre Armazéns de Dados, do qual as suas estruturas e dados são usados pelo processo semiautomático para criar o repositório de dados. O terceiro capítulo apresenta uma análise sobre Linked Data, nomeadamente o seu conceito, os seus princípios e linguagens que podem ser usadas para o expressar. Uma dessas linguagens (RDF ou OWL) em combinação com uma serialização (e.g. XML, N-Triples, etc.) que é usado para descrever o repositório de dados que o processo semiautomático pode criar. O quarto capítulo apresenta um levantamento de linguagens e tecnologias de mapeamento de RDB para RDF, em que R2RML é usado pelo processo semiautomático para criar mapeamentos entre um Armazéns de Dados e o repositório de dados. O quinto capítulo apresenta o vocabulário recomendado pela W3C para descrever um Cubo de Dados que vai ser usado para classificar o repositório de dados, criado pelo processo semiautomático. O sexto capítulo apresenta e descreve o processo semiautomático proposto com um exemplo que decorre e evolui ao longo de cada passo implementado. E o ultimo e sétimo capítulo contém as conclusões obtidas deste trabalho e algumas limitações possíveis. Também contem algumas sugestões de possíveis futuros trabalhos que podem ser acrescentados ao processo semiautomático

    Semantic SPARQL Query in a Relational Database Based on Ontology Construction

    Full text link
    © 2015 IEEE. Constructing an ontology from RDBs and its query through ontologies is a fundamental problem for the development of the semantic web. This paper proposes an approach to extract ontology directly from RDB in the form of OWL/RDF triples, to ensure its availability at semantic web. We automatically construct an OWL ontology from RDB schema using direct mapping rules. The mapping rules provide the basic rules for generating RDF triples from RDB data even for column contents null value, and enable semantic query engines to answer more relevant queries. Then we rewriting SPARQL query from SQL by translating SQL relational algebra into an equivalent SPARQL. The proposed method is demonstrated with examples and the effectiveness of the proposed approach is evaluated by experimental results

    A software system for agent-assisted ontology building

    Get PDF
    This thesis investigates how one can design a team of intelligent software agents that helps its human partner develop a formal ontology from a relational database and enhance it with higher-level abstractions. The resulting efficiency of ontology development could facilitate the building of intelligent decision support systems that allow: high-level semantic queries on legacy relational databases autonomous implementation within a host organization and incremental deployment without affecting the underlying database or its conventional use. We introduce a set of design principles, formulate the prototype system requirements and architecture, elaborate agent roles and interactions, develop suitable design techniques, and test the approach through practical implementation of selected features. We endow each agent with model meta-ontology, which enables it to reason and communicate about ontology, and planning meta-ontology, which captures the role-specific know-how of the ontology building method. We also assess the maturity of development tools for a larger-scale implementation. --Leaf i.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b214471

    A survey of RDB to RDF translation approaches and tools

    Get PDF
    ISRN I3S/RR 2013-04-FR 24 pagesRelational databases scattered over the web are generally opaque to regular web crawling tools. To address this concern, many RDB-to-RDF approaches have been proposed over the last years. In this paper, we propose a detailed review of seventeen RDB-to-RDF initiatives, considering end-to-end projects that delivered operational tools. The different tools are classified along three major axes: mapping description language, mapping implementation and data retrieval method. We analyse the motivations, commonalities and differences between existing approaches. The expressiveness of existing mapping languages is not always sufficient to produce semantically rich data and make it usable, interoperable and linkable. We therefore briefly present various strategies investigated in the literature to produce additional knowledge. Finally, we show that R2RML, the W3C recommendation for describing RDB to RDF mappings, may not apply to all needs in the wide scope of RDB to RDF translation applications, leaving space for future extensions

    Building a Knowledge Graph for Food, Energy, and Water Systems

    Get PDF
    Title from PDF of title page viewed January 30, 2018Thesis advisor: Praveen R. RaoVitaIncludes bibliographical references (pages 41-44)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2017A knowledge graph represents millions of facts and reliable information about people, places, and things. Several companies like Microsoft, Amazon, and Google have developed knowledge graphs to better customer experience. These knowledge graphs have proven their reliability and their usage for providing better search results; answering ambiguous questions regarding entities; and training semantic parsers to enhance the semantic relationships over the Semantic Web. Motivated by these reasons, in this thesis, we develop an approach to build a knowledge graph for the Food, Energy, and Water (FEW) systems given the vast amount of data that is available from federal agencies like the United States Department of Agriculture (USDA), the National Oceanic and Atmospheric Administration (NOAA), the U.S. Geological Survey (USGS), and the National Drought Mitigation Center (NDMC). Our goal is to facilitate better analytics for FEW and enable domain experts to conduct data-driven research. To construct the knowledge graph, we employ Semantic Web technologies, namely, the Resource Description Framework (RDF), the Web Ontology Language (OWL), and SPARQL. Starting with raw data (e.g., CSV files), we construct entities and relationships and extend them semantically using a tool called Karma. We enhance this initial knowledge graph by adding new relationships across entities by extracting information from ConceptNet via an efficient similarity searching algorithm. We show initial performance results and discuss the quality of the knowledge graph on several datasets from the USDA.Introduction -- Challenges -- Background and related work -- Approach -- Evaluation -- Conclusion and future wor
    corecore