1,336 research outputs found
Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML
OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a
database of all EC FP7 and H2020 funded research projects, including metadata
of their results (publications and datasets). These data are stored in an HBase
NoSQL database, post-processed, and exposed as HTML for human consumption, and
as XML through a web service interface. As an intermediate format to facilitate
statistical computations, CSV is generated internally. To interlink the
OpenAIRE data with related data on the Web, we aim at exporting them as Linked
Open Data (LOD). The LOD export is required to integrate into the overall data
processing workflow, where derived data are regenerated from the base data
every day. We thus faced the challenge of identifying the best-performing
conversion approach.We evaluated the performances of creating LOD by a
MapReduce job on top of HBase, by mapping the intermediate CSV files, and by
mapping the XML output.Comment: Accepted in 0th Metadata and Semantics Research Conferenc
Flexible Integration and Efficient Analysis of Multidimensional Datasets from the Web
If numeric data from the Web are brought together, natural scientists can compare climate measurements with estimations, financial analysts can evaluate companies based on balance sheets and daily stock market values, and citizens can explore the GDP per capita from several data sources. However, heterogeneities and size of data remain a problem. This work presents methods to query a uniform view - the Global Cube - of available datasets from the Web and builds on Linked Data query approaches
Multidimensional integration of RDF datasets
Data providers have been uploading RDF datasets on the web to aid researchers and analysts in finding insights. These datasets, made available by different data providers, contain common characteristics that enable their integration. However, since each provider has their own data dictionary, identifying common concepts is not trivial and we require costly and complex entity resolution and transformation rules to perform such integration. In this paper, we propose a novel method, that given a set of independent RDF datasets, provides a multidimensional interpretation of these datasets and integrates them based on a common multidimensional space (if any) identified. To do so, our method first identifies potential dimensional and factual data on the input datasets and performs entity resolution to merge common dimensional and factual concepts. As a result, we generate a common multidimensional space and identify each input dataset as a cuboid of the resulting lattice. With such output, we are able to exploit open data with OLAP operators in a richer fashion than dealing with them separately.This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence-Doctoral College (IT4BI-DC) program.Peer ReviewedPostprint (author's final draft
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
GEORDi: Supporting lightweight end-user authoring and exploration of Linked Data
The US and UK governments have recently made much of the data created by their various departments available as data sets (often as csv files) available on the web. Known as ”open data” while these are valuable assets, much of this data remains useless because it is effectively inaccessible for citizens to access for the following reasons: (1) it is often a tedious, many step process for citizens simply to find data relevant to a query. Once the data candidate is located, it often must be downloaded and opened in a separate application simply to see if the data that may satisfy the query is contained in it. (2) It is difficult to join related data sets to create richer integrated information (3) it is particularly difficult to query either a single data set, and even harder to query across related data sets. (4) To date, one has had to be well versed in semantic web protocols like SPARQL, RDF and URI formation to integrate and query such sources as reusable linked data. Our goal has been to develop tools that will let regular, non-programmer web citizens make use of this Web of Data. To this end, we present GEORDi, a set of integrated tools and services that lets citizen users identify, explore, query and represent these open data sources over the web via Linked Data mechanisms. In this paper we describe the GEORDi process of authoring new and translating existing open data in a linkable format, GEORDi’s lens mechanism for rendering rich, plain language descriptions and views of resources, and the GEORDI link-sliding paradigm for data exploration. With these tools we demonstrate that it is possible to make the Web of open (and linked) data accessible for ordinary web citizen users
EAGLE—A Scalable Query Processing Engine for Linked Sensor Data
Recently, many approaches have been proposed to manage sensor data using semantic web technologies for effective heterogeneous data integration. However, our empirical observations revealed that these solutions primarily focused on semantic relationships and unfortunately paid less attention to spatio–temporal correlations. Most semantic approaches do not have spatio–temporal support. Some of them have attempted to provide full spatio–temporal support, but have poor performance for complex spatio–temporal aggregate queries. In addition, while the volume of sensor data is rapidly growing, the challenge of querying and managing the massive volumes of data generated by sensing devices still remains unsolved. In this article, we introduce EAGLE, a spatio–temporal query engine for querying sensor data based on the linked data model. The ultimate goal of EAGLE is to provide an elastic and scalable system which allows fast searching and analysis with respect to the relationships of space, time and semantics in sensor data. We also extend SPARQL with a set of new query operators in order to support spatio–temporal computing in the linked sensor data context.EC/H2020/732679/EU/ACTivating InnoVative IoT smart living environments for AGEing well/ACTIVAGEEC/H2020/661180/EU/A Scalable and Elastic Platform for Near-Realtime Analytics for The Graph of Everything/SMARTE
- …