Search CORE

5,975 research outputs found

Managing polyglot systems metadata with hypergraphs

Author: F. Saltor
Jennie Duggan
Jérôme Euzenat
Lanjun Wang
Manos Karpathiotakis
P. Shvaiko
Paolo Atzeni
Ying-Ti Liao
Ágnes Vathy-Fogarassy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

A single type of data store can hardly fulfill every end-user requirements in the NoSQL world. Therefore, polyglot systems use different types of NoSQL datastores in combination. However, the heterogeneity of the data storage models makes managing the metadata a complex task in such systems, with only a handful of research carried out to address this. In this paper, we propose a hypergraph-based approach for representing the catalog of metadata in a polyglot system. Taking an existing common programming interface to NoSQL systems, we extend and formalize it as hypergraphs for managing metadata. Then, we define design constraints and query transformation rules for three representative data store types. Furthermore, we propose a simple query rewriting algorithm using the catalog itself for these data store types and provide a prototype implementation. Finally, we show the feasibility of our approach on a use case of an existing polyglot system.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

DI-fusion

A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

Author: Price S
Publication venue: Department of Computer Science, University of Bristol
Publication date: 01/01/2004
Field of study

Explore Bristol Research

Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

Author: Huang Jyun-Yao
Karim Farah
Lange Christoph
Vahdati Sahar
Publication venue
Publication date: 01/01/2015
Field of study

OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc

arXiv.org e-Print Archive

Fraunhofer-ePrints

Automated metadata generation for linked data generation and publishing workflows

Author: De Nies Tom
Dimou Anastasia
Mannens Erik
Mechant Peter
Van de Walle Rik
Verborgh Ruben
Publication venue: CEUR-WS.org
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

A Review of the State of the Art of Machine Learning on the Semantic Web

Author: Price SN
Publication venue
Publication date: 01/09/2003
Field of study

Explore Bristol Research

Machine-interpretable dataset and service descriptions for heterogeneous data access and retrieval

Author: Alexander K.
Christensen E.
Das S.
de Bruijn J.
de Bruijn J.
Dimou A.
Lanthaler M.
Maali F.
Martin D.
Stadler C.
Tennison J.
Vitvar T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Crossref

Ghent University Academic Bibliography

Exposing Provenance Metadata Using Different RDF Models

Author: Bodenreider Olivier
Bolton Evan
Dumontier Michel
Fu Gang
Furlong Laura I.
Nguyen Vinh
Rosinach Núria Queralt
Sheth Amit
Publication venue
Publication date: 01/01/2015
Field of study

A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model

arXiv.org e-Print Archive

Maastricht University Research Portal

Theory and Practice of Data Citation

Author: Silvello Gianmaria
Publication venue: 'Wiley'
Publication date: 24/06/2017
Field of study

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova