64 research outputs found

    Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

    Full text link
    OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc

    XQOWL: An Extension of XQuery for OWL Querying and Reasoning

    Full text link
    One of the main aims of the so-called Web of Data is to be able to handle heterogeneous resources where data can be expressed in either XML or RDF. The design of programming languages able to handle both XML and RDF data is a key target in this context. In this paper we present a framework called XQOWL that makes possible to handle XML and RDF/OWL data with XQuery. XQOWL can be considered as an extension of the XQuery language that connects XQuery with SPARQL and OWL reasoners. XQOWL embeds SPARQL queries (via Jena SPARQL engine) in XQuery and enables to make calls to OWL reasoners (HermiT, Pellet and FaCT++) from XQuery. It permits to combine queries against XML and RDF/OWL resources as well as to reason with RDF/OWL data. Therefore input data can be either XML or RDF/OWL and output data can be formatted in XML (also using RDF/OWL XML serialization).Comment: In Proceedings PROLE 2014, arXiv:1501.0169

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Supporting SPARQL Update Queries in RDF-XML Integration

    Full text link
    The Web of Data encourages organizations and companies to publish their data according to the Linked Data practices and offer SPARQL endpoints. On the other hand, the dominant standard for information exchange is XML. The SPARQL2XQuery Framework focuses on the automatic translation of SPARQL queries in XQuery expressions in order to access XML data across the Web. In this paper, we outline our ongoing work on supporting update queries in the RDF-XML integration scenario.Comment: 13th International Semantic Web Conference (ISWC '14

    Parallel RDF generation from heterogeneous big data

    Get PDF
    To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion

    A survey of RDB to RDF translation approaches and tools

    Get PDF
    ISRN I3S/RR 2013-04-FR 24 pagesRelational databases scattered over the web are generally opaque to regular web crawling tools. To address this concern, many RDB-to-RDF approaches have been proposed over the last years. In this paper, we propose a detailed review of seventeen RDB-to-RDF initiatives, considering end-to-end projects that delivered operational tools. The different tools are classified along three major axes: mapping description language, mapping implementation and data retrieval method. We analyse the motivations, commonalities and differences between existing approaches. The expressiveness of existing mapping languages is not always sufficient to produce semantically rich data and make it usable, interoperable and linkable. We therefore briefly present various strategies investigated in the literature to produce additional knowledge. Finally, we show that R2RML, the W3C recommendation for describing RDB to RDF mappings, may not apply to all needs in the wide scope of RDB to RDF translation applications, leaving space for future extensions
    corecore