    Efficient RDF Interchange (ERI) format for RDF data streams

    RDF streams are sequences of timestamped RDF statements or graphs, which can be generated by several types of data sources (sensors, social networks, etc.). They may provide data at high volumes and rates, and be consumed by applications that require real-time responses. Hence it is important to publish and interchange them efficiently. In this paper, we exploit a key feature of RDF data streams, which is the regularity of their structure and data values, proposing a compressed, efficient RDF interchange (ERI) format, which can reduce the amount of data transmitted when processing RDF streams. Our experimental evaluation shows that our format produces state-of-the-art streaming compression, remaining efficient in performance

    On the road to the evaluation of RDF stream compression techniques

    Proceedings of RDF Stream Processing Workshop in conjunction with the 12th Extended Semantic Web Conference (ESWC 2015), May 31st, 2015 in Portoroz, SloveniaThe popularization of data streaming applications, such as those related to social networks and the Internet of Things, has fostered the interest of the Semantic Web community for this kind of data. As a result of this interest, the W3C RDF Stream Processing (RSP) community group has recently been started with the goal of defining a common model “for producing, transmitting and continuously querying RDF Streams”. In this EOI we focus on the transmission model. As pointed out by recent research efforts (e.g. Ztreamy and CQELS Cloud), the efficient transmission of RDF streams is a necessary step to ensure higher throughput in RDF stream processors.This work is partially funded by Ministerio de Economía y Competitividad (Spain) under the projects “HERMES-SMARTDRIVER” (TIN2013-46801-C4-2-R) and “4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), and Austrian Science Fund (FWF): M1720-G1

    Towards efficient processing of RDF Data Streams

    In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine

    Toward Semantic Sensor Data Archives on the Web

    Sensor datasets on the Web are becoming increasingly available, and there is a need for making them discoverable and accessible, so that they can be reused despite their heterogeneity. While RDF and Linked Data provide fundamental principles for sharing data on the Web, it is evidenced that they have limitations for efficiently transmitting and archiving sensor data. In this paper we identify some of the main challenges for engineering semantic sensor data archives, and we present an abstract architecture for such type of infrastructure. The proposed approach is based on a mix of RDF metadata and raw sensor archives with RDF mappings, so that data can be RDF-ized on demand. We use a real sensor deployment for air quality monitoring as a motivating use case and running example, and we show preliminary results on RDF transformation, compared with a representative data compression algorithm

    Compact semantic representations of observational data

    Das Konzept des Internet der Dinge (IoT) ist in mehreren Bereichen weit verbreitet, damit Geräte miteinander interagieren und bestimmte Aufgaben erfüllen können. IoT-Geräte umfassen verschiedene Konzepte, z.B. Sensoren, Programme, Computer und Aktoren. IoT-Geräte beobachten ihre Umgebung, um Informationen zu sammeln und miteinander zu kommunizieren, um gemeinsame Aufgaben zu erfüllen. Diese Vorrichtungen erzeugen kontinuierlich Beobachtungsdatenströme, die zu historischen Daten werden, wenn diese Beobachtungen gespeichert werden. Durch die Zunahme der Anzahl der IoT-Geräte wird eine große Menge an Streaming- und historischen Beobachtungsdaten erzeugt. Darüber hinaus wurden mehrere Ontologien, wie die Semantic Sensor Network (SSN) Ontologie, für die semantische Annotation von Beobachtungsdaten vorgeschlagen - entweder Stream oder historisch. Das Resource Description Framework (RDF) ist ein weit verbreitetes Datenmodell zur semantischen Beschreibung der Datensätze. Semantische Annotation bietet ein gemeinsames Verständnis für die Verarbeitung und Analyse von Beobachtungsdaten. Durch das Hinzufügen von Semantik wird die Datengröße jedoch weiter erhöht, insbesondere wenn die Beobachtungswerte von mehreren Geräten redundant erfasst werden. So können beispielsweise mehrere Sensoren Beobachtungen erzeugen, die den gleichen Wert für die relative Luftfeuchtigkeit in einem bestimmten Zeitstempel und einer bestimmten Stadt anzeigen. Diese Situation kann in einem RDF-Diagramm mit vier RDF-Tripel dargestellt werden, wobei Beobachtungen als Tripel dargestellt werden, die das beobachtete Phänomen, die Maßeinheit, den Zeitstempel und die Koordinaten beschreiben. Die RDF-Tripel einer Beobachtung sind mit dem gleichen Thema verbunden. Solche Beobachtungen teilen sich die gleichen Objekte in einer bestimmten Gruppe von Eigenschaften, d.h. sie entsprechen einem Sternmuster, das sich aus diesen Eigenschaften und Objekten zusammensetzt. Wenn die Anzahl dieser Subjektentitäten oder Eigenschaften in diesen Sternmustern groß ist, wird die Größe des RDF-Diagramms und der Abfrageverarbeitung negativ beeinflusst; wir bezeichnen diese Sternmuster als häufige Sternmuster. Diese Arbeit befasst sich mit dem Problem der Identifizierung von häufigen Sternenmustern in RDF-Diagrammen und entwickelt Berechnungsmethoden, um häufige Sternmuster zu identifizieren und ein faktorisiertes RDF-Diagramm zu erzeugen, bei dem die Anzahl der häufigen Sternmuster minimiert wird. Darüber hinaus wenden wir diese faktorisierten RDF-Darstellungen über historische semantische Sensordaten an, die mit der SSN-Ontologie beschrieben werden, und präsentieren tabellarische Darstellungen von faktorisierten semantischen Sensordaten, um Big Data-Frameworks auszunutzen. Darüber hinaus entwickelt diese Arbeit einen wissensbasierten Ansatz namens DESERT, der in der Lage ist, bei Bedarf Streamdaten zu faktorisieren und semantisch anzureichern (on-Demand factorizE and Semantically Enrich stReam daTa). Wir bewerten die Leistung unserer vorgeschlagenen Techniken anhand mehrerer RDF-Diagramm-Benchmarks. Die Ergebnisse zeigen, dass unsere Techniken in der Lage sind, häufige Sternmuster effektiv und effizient zu erkennen, und die Größe der RDF-Diagramme kann um bis zu 66,56% reduziert werden, während die im ursprünglichen RDF-Diagramm dargestellten Daten erhalten bleiben. Darüber hinaus sind die kompakten Darstellungen in der Lage, die Anzahl der RDF-Tripel um mindestens 53,25% in historischen Beobachtungsdaten und bis zu 94,34% in Beobachtungsdatenströmen zu reduzieren. Darüber hinaus reduzieren die Ergebnisse der Anfrageauswertung über historische Daten die Ausführungszeit der Anfrage um bis zu drei Größenordnungen. In Beobachtungsdatenströmen wird die Größe der zur Beantwortung der Anfrage benötigten Daten um 92,53% reduziert, wodurch der Speicherplatzbedarf zur Beantwortung der Anfragen reduziert wird. Diese Ergebnisse belegen, dass IoT-Daten mit den vorgeschlagenen kompakten Darstellungen effizient dargestellt werden können, wodurch die negativen Auswirkungen semantischer Annotationen auf das IoT-Datenmanagement reduziert werden.The Internet of Things (IoT) concept has been widely adopted in several domains to enable devices to interact with each other and perform certain tasks. IoT devices encompass different concepts, e.g., sensors, programs, computers, and actuators. IoT devices observe their surroundings to collect information and communicate with each other in order to perform mutual tasks. These devices continuously generate observational data streams, which become historical data when these observations are stored. Due to an increase in the number of IoT devices, a large amount of streaming and historical observational data is being produced. Moreover, several ontologies, like the Semantic Sensor Network (SSN) Ontology, have been proposed for semantic annotation of observational data-either streams or historical. Resource Description Framework (RDF) is widely adopted data model to semantically describe the datasets. Semantic annotation provides a shared understanding for processing and analysis of observational data. However, adding semantics, further increases the data size especially when the observation values are redundantly sensed by several devices. For example, several sensors can generate observations indicating the same value for relative humidity in a given timestamp and city. This situation can be represented in an RDF graph using four RDF triples where observations are represented as triples that describe the observed phenomenon, the unit of measurement, the timestamp, and the coordinates. The RDF triples of an observation are associated with the same subject. Such observations share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these subject entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. This thesis addresses the problem of identifying frequent star patterns in RDF graphs and develop computational methods to identify frequent star patterns and generate a factorized RDF graph where the number of frequent star patterns is minimized. Furthermore, we apply these factorized RDF representations over historical semantic sensor data described using the SSN ontology and present tabular-based representations of factorized semantic sensor data in order to exploit Big Data frameworks. In addition, this thesis devises a knowledge-driven approach named DESERT that is able to on-Demand factorizE and Semantically Enrich stReam daTa. We evaluate the performance of our proposed techniques on several RDF graph benchmarks. The outcomes show that our techniques are able to effectively and efficiently detect frequent star patterns and RDF graph size can be reduced by up to 66.56% while data represented in the original RDF graph is preserved. Moreover, the compact representations are able to reduce the number of RDF triples by at least 53.25% in historical observational data and upto 94.34% in observational data streams. Additionally, query evaluation results over historical data reduce query execution time by up to three orders of magnitude. In observational data streams the size of the data required to answer the query is reduced by 92.53% reducing the memory space requirements to answer the queries. These results provide evidence that IoT data can be efficiently represented using the proposed compact representations, reducing thus, the negative impact that semantic annotations may have on IoT data management

    Entity spatio-temporal evolution summarization in knowledge graphs

    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordKnowledge graph has been growing in popularity with extensive applications in recent years, such as entity alignment, entity summarization, question answering, etc. However, the majority of research only focuses on one snapshot of the knowledge graph and neglects its dynamicity in nature, which often causes missing important information contained in other versions of the knowledge graph. Even worse, the incompleteness of the data in the knowledge graph is a challenge issue, which hinders the further utilization of the data. Considering that knowledge graph can evolve with time as well as the changing locations, it is necessary to summarize and integrate the entity temporal and spatial evolution information. To address this challenge, this paper pioneers to formulate the problem of entity spatio-temporal evolution summarization, capturing the entity evolution with time and location changes and integrating the data from two groups of various knowledge graphs. Further, we propose a two-stage approach: 1) generate entity temporal summarization and spatial summarization by utilizing the Triadic Formal Concept Analysis; 2) produce the spatio-temporal evolution summarization of the entity by adopting a fusion strategy. The obtained summarization results can be used to the visualization of the entity spatio-temporal evolution, data integration, and question answering.National Natural Science Foundation of ChinaEuropean Union Horizon 2020Natural Science Basic Research Plan in Shaanxi Province of ChinaFund Program for the Scientific Activities of Selected Returned Overseas Professionals in Shaanxi Provinc

    Compacting Frequent Star Patterns in RDF Graphs

    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top of gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude reducing the RDF graph size by up to 66.56%

    Strategies for Managing Linked Enterprise Data

    Data, information and knowledge become key assets of our 21st century economy. As a result, data and knowledge management become key tasks with regard to sustainable development and business success. Often, knowledge is not explicitly represented residing in the minds of people or scattered among a variety of data sources. Knowledge is inherently associated with semantics that conveys its meaning to a human or machine agent. The Linked Data concept facilitates the semantic integration of heterogeneous data sources. However, we still lack an effective knowledge integration strategy applicable to enterprise scenarios, which balances between large amounts of data stored in legacy information systems and data lakes as well as tailored domain specific ontologies that formally describe real-world concepts. In this thesis we investigate strategies for managing linked enterprise data analyzing how actionable knowledge can be derived from enterprise data leveraging knowledge graphs. Actionable knowledge provides valuable insights, supports decision makers with clear interpretable arguments, and keeps its inference processes explainable. The benefits of employing actionable knowledge and its coherent management strategy span from a holistic semantic representation layer of enterprise data, i.e., representing numerous data sources as one, consistent, and integrated knowledge source, to unified interaction mechanisms with other systems that are able to effectively and efficiently leverage such an actionable knowledge. Several challenges have to be addressed on different conceptual levels pursuing this goal, i.e., means for representing knowledge, semantic data integration of raw data sources and subsequent knowledge extraction, communication interfaces, and implementation. In order to tackle those challenges we present the concept of Enterprise Knowledge Graphs (EKGs), describe their characteristics and advantages compared to existing approaches. We study each challenge with regard to using EKGs and demonstrate their efficiency. In particular, EKGs are able to reduce the semantic data integration effort when processing large-scale heterogeneous datasets. Then, having built a consistent logical integration layer with heterogeneity behind the scenes, EKGs unify query processing and enable effective communication interfaces for other enterprise systems. The achieved results allow us to conclude that strategies for managing linked enterprise data based on EKGs exhibit reasonable performance, comply with enterprise requirements, and ensure integrated data and knowledge management throughout its life cycle
