3,528 research outputs found

    RDF-TR: Exploiting structural redundancies to boost RDF compression

    Get PDF
    The number and volume of semantic data have grown impressively over the last decade, promoting compression as an essential tool for RDF preservation, sharing and management. In contrast to universal compressors, RDF compression techniques are able to detect and exploit specific forms of redundancy in RDF data. Thus, state-of-the-art RDF compressors excel at exploiting syntactic and semantic redundancies, i.e., repetitions in the serialization format and information that can be inferred implicitly. However, little attention has been paid to the existence of structural patterns within the RDF dataset; i.e. structural redundancy. In this paper, we analyze structural regularities in real-world datasets, and show three schema-based sources of redundancies that underpin the schema-relaxed nature of RDF. Then, we propose RDF-Tr (RDF Triples Reorganizer), a preprocessing technique that discovers and removes this kind of redundancy before the RDF dataset is effectively compressed. In particular, RDF-Tr groups subjects that are described by the same predicates, and locally re-codes the objects related to these predicates. Finally, we integrate RDF-Tr with two RDF compressors, HDT and k2-triples. Our experiments show that using RDF-Tr with these compressors improves by up to 2.3 times their original effectiveness, outperforming the most prominent state-of-the-art techniques

    Compressed k2-Triples for Full-In-Memory RDF Engines

    Get PDF
    Current "data deluge" has flooded the Web of Data with very large RDF datasets. They are hosted and queried through SPARQL endpoints which act as nodes of a semantic net built on the principles of the Linked Data project. Although this is a realistic philosophy for global data publishing, its query performance is diminished when the RDF engines (behind the endpoints) manage these huge datasets. Their indexes cannot be fully loaded in main memory, hence these systems need to perform slow disk accesses to solve SPARQL queries. This paper addresses this problem by a compact indexed RDF structure (called k2-triples) applying compact k2-tree structures to the well-known vertical-partitioning technique. It obtains an ultra-compressed representation of large RDF graphs and allows SPARQL queries to be full-in-memory performed without decompression. We show that k2-triples clearly outperforms state-of-the-art compressibility and traditional vertical-partitioning query resolution, remaining very competitive with multi-index solutions.Comment: In Proc. of AMCIS'201

    An Empirical Study of Real-World SPARQL Queries

    Get PDF
    Understanding how users tailor their SPARQL queries is crucial when designing query evaluation engines or fine-tuning RDF stores with performance in mind. In this paper we analyze 3 million real-world SPARQL queries extracted from logs of the DBPedia and SWDF public endpoints. We aim at finding which are the most used language elements both from syntactical and structural perspectives, paying special attention to triple patterns and joins, since they are indeed some of the most expensive SPARQL operations at evaluation phase. We have determined that most of the queries are simple and include few triple patterns and joins, being Subject-Subject, Subject-Object and Object-Object the most common join types. The graph patterns are usually star-shaped and despite triple pattern chains exist, they are generally short.Comment: 1st International Workshop on Usage Analysis and the Web of Data (USEWOD2011) in the 20th International World Wide Web Conference (WWW2011), Hyderabad, India, March 28th, 201

    On the road to the evaluation of RDF stream compression techniques

    Get PDF
    Proceedings of RDF Stream Processing Workshop in conjunction with the 12th Extended Semantic Web Conference (ESWC 2015), May 31st, 2015 in Portoroz, SloveniaThe popularization of data streaming applications, such as those related to social networks and the Internet of Things, has fostered the interest of the Semantic Web community for this kind of data. As a result of this interest, the W3C RDF Stream Processing (RSP) community group has recently been started with the goal of defining a common model “for producing, transmitting and continuously querying RDF Streams”. In this EOI we focus on the transmission model. As pointed out by recent research efforts (e.g. Ztreamy and CQELS Cloud), the efficient transmission of RDF streams is a necessary step to ensure higher throughput in RDF stream processors.This work is partially funded by Ministerio de Economía y Competitividad (Spain) under the projects “HERMES-SMARTDRIVER” (TIN2013-46801-C4-2-R) and “4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R), and Austrian Science Fund (FWF): M1720-G1

    Towards efficient processing of RDF Data Streams

    Get PDF
    In the last years, there has been an increase in the amount of real-time data generated. Sensors attached to things are transforming how we interact with our environment. Extracting meaningful information from these streams of data is essential for some application areas and requires processing systems that scale to varying conditions in data sources, complex queries, and system failures. This paper describes ongoing research on the development of a scalable RDF streaming engine

    What are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web

    Get PDF
    Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable and reusable datasets. We argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. Firstly, in order to define boundaries of single coherent knowledge graphs within Linked Data, a principled notion of what a dataset is, or, respectively, what links within and between datasets are, has been missing. Secondly, we argue that in order to enable FAIR knowledge graphs, Linked Data misses standardised findability and accessability mechanism, via a single entry link. In order to address the first issue, we (i) propose a rigorous definition of a naming authority for a Linked Data dataset (ii) define different link types for data in Linked datasets, (iii) provide an empirical analysis of linkage among the datasets of the Linked Open Data cloud, and (iv) analyse the dereferenceability of those links. We base our analyses and link computations on a scalable mechanism implemented on top of the HDT format, which allows us to analyse quantity and quality of different link types at scale.Series: Working Papers on Information Systems, Information Business and Operation

    Relationships between executive function of children in residential care and caregivers’ discipline style: a pilot study.

    Get PDF
    Despite legislative efforts to ensure that the Residential Care (RC) guarantees good care for children, there are difficulties inherent to the profile of the foster population. One of the areas affected in this population is executive functions. However, there is a lack of information on how these functions are related to other variables as affective relationships (affection/communication and criticism/rejection) and the discipline styles of caregivers. Therefore, this research aims to study these relationships. Forty-six boys and girls between 10 and 16 years old and thirty-nine caregivers from seven residential centres participated in the study. This work is a pilot study within a larger research that includes all the RC centres in the province of Malaga (Spain). BRIEF-2 Family version was used to assess executive functions, while Warmth Scale (EA) and the Rules-Demands Scale (ENE) were used to asses affection and discipline style of caregivers. The results showed that: (a) about 50% of the sample shows scores classified as high or clinically significant on all scales and indexes of executive functions, while the general Spanish population shows only 16-19% in this same categories, (b) higher scores on perceived criticism/rejection show a postive correlation with difficulties in emotional and cognitive control, and (c) an indulgent/permissive discipline style is positively correlated to cognitive control problems. These results are not conclusive, as they correspond to the pilot study phase, nevertheless they already point to the need of deeper research about the difficulties presented by the population in RC in terms of executive functions.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    MapReduce-based Solutions for Scalable SPARQL Querying

    Get PDF
    The use of RDF to expose semantic data on the Web has seen a dramatic increase over the last few years. Nowadays, RDF datasets are so big and rconnected that, in fact, classical mono-node solutions present significant scalability problems when trying to manage big semantic data. MapReduce, a standard framework for distributed processing of great quantities of data, is earning a place among the distributed solutions facing RDF scalability issues. In this article, we survey the most important works addressing RDF management and querying through diverse MapReduce approaches, with a focus on their main strategies, optimizations and results
    corecore