161 research outputs found

    Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

    Full text link
    OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Web and Semantic Web Query Languages

    Get PDF
    A number of techniques have been developed to facilitate powerful data retrieval on the Web and Semantic Web. Three categories of Web query languages can be distinguished, according to the format of the data they can retrieve: XML, RDF and Topic Maps. This article introduces the spectrum of languages falling into these categories and summarises their salient aspects. The languages are introduced using common sample data and query types. Key aspects of the query languages considered are stressed in a conclusion

    Format-independent and metadata-driven media resource adaptation using semantic web technologies

    Get PDF
    Adaptation of media resources is an emerging field due to the growing amount of multimedia content on the one hand and an increasing diversity in usage environments on the other hand. Furthermore, to deal with a plethora of coding and metadata formats, format-independent adaptation systems are important. In this paper, we present a new format-independent adaptation system. The proposed adaptation system relies on a model that takes into account the structural metadata, semantic metadata, and scalability information of media bitstreams. The model is implemented using the web ontology language. Existing coding formats are mapped to the structural part of the model, while existing metadata standards can be linked to the semantic part of the model. Our new adaptation technique, which is called RDF-driven content adaptation, is based on executing SPARQL Protocol and RDF Query Language queries over instances of the model for media bitstreams. Using different criteria, RDF-driven content adaptation is compared to other adaptation techniques. Next to real-time execution times, RDF-driven content adaptation provides a high abstraction level for the definition of adaptations and allows a seamless integration with existing semantic metadata standards

    Survey over Existing Query and Transformation Languages

    Get PDF
    A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

    A distributed approach to XML interoperability

    Get PDF
    "We expand upon previous research performed on creating a lightweight infrastructure for the purpose of achieving interoperability among XML data sources. The approach is based on enriching local sources with semantic declarations so as to enable interoperability. These declarations capture the information content and concepts of the sources through mappings to a common application specific vocabulary. We design and implement a peer-to-peer network architecture through which global queries are initiated and responded to. We further examine and implement a methodology for merging the XML results of global queries into a single XML formatted response."--Abstract from author supplied metadata

    Format-independent media resource adaptation and delivery

    Get PDF

    Integration of Heterogeneous Data Sources in an Ontological Knowledge Base

    Get PDF
    In this paper we present X2R, a system for integrating heterogeneous data sources in an ontological knowledge base. The main goal of the system is to create a unified view of information stored in relational, XML and LDAP data sources within an organization, expressed in RDF using a common ontology and valid according to a prescribed set of integrity constraints. X2R supports a wide range of source schemas and target ontologies by allowing the user to define potentially complex transformations of data between the original data source and the unified knowledge base. A rich set of integrity constraint primitives has been provided to ensure the quality of the unified data set. They are also leveraged in a novel approach towards semantic optimization of SPARQL queries

    Format-Independent Rich Media Delivery Using the Bitstream Binding Language

    Full text link
    corecore