532 research outputs found

    An Analytical Study of Large SPARQL Query Logs

    Full text link
    With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

    An Analytical Study of Large SPARQL Query Logs

    Get PDF
    International audienceWith the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end-users has become more and more common in SPARQL endpoints. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date structured query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, spanning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural characteristics related to the graph and hypergraph representation of queries. We outline the most common shapes of queries when visually displayed as undirected graphs, characterize their tree width, length of their cycles, maximal degree of nodes, and more. For queries that cannot be adequately represented as graphs, we investigate their hypergraphs and hypertree width. Moreover, we analyze the evolution of queries over time, by introducing the novel concept of a streak, i.e., a sequence of queries that appear as subsequent modifications of

    How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?

    Full text link
    The current de-facto way to query the Web of Data is through the SPARQL protocol, where a client sends queries to a server through a SPARQL endpoint. Contrary to an HTTP server, providing and maintaining a robust and reliable endpoint requires a significant effort that not all publishers are willing or able to make. An alternative query evaluation method is through link traversal, where a query is answered by dereferencing online web resources (URIs) at real time. While several approaches for such a lookup-based query evaluation method have been proposed, there exists no analysis of the types (patterns) of queries that can be directly answered on the live Web, without accessing local or remote endpoints and without a-priori knowledge of available data sources. In this paper, we first provide a method for checking if a SPARQL query (to be evaluated on a SPARQL endpoint) can be answered through zero-knowledge link traversal (without accessing the endpoint), and analyse a large corpus of real SPARQL query logs for finding the frequency and distribution of answerable and non-answerable query patterns. Subsequently, we provide an algorithm for transforming answerable queries to SPARQL-LD queries that bypass the endpoints. We report experimental results about the efficiency of the transformed queries and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

    Using SPARQL – the practitioners’ viewpoint

    Get PDF
    A number of studies have analyzed SPARQL log data to draw conclusions about how SPARQL is being used. To complement this work, a survey of SPARQL users has been undertaken. Whilst confirming some of the conclusions of the previous studies, the current work is able to provide additional insight into how users create SPARQL queries, the difficulties they encounter, and the features they would like to see included in the language. Based on this insight, a number of recommendations are presented to the community. These relate to predicting and avoiding computationally expensive queries; extensions to the language; and extending the search paradigm

    SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

    Get PDF
    While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.Series: Working Papers on Information Systems, Information Business and Operation

    PFed: Recommending Plausible Federated SPARQL Queries

    Get PDF
    International audienceFederated SPARQL queries allow to query multiple inter-linked datasets hosted by remote SPARQL endpoints. However, finding federated queries over a growing number of datasets is challenging. In this paper, we propose PFed, an approach to recommend plausible fed-erated queries based on real query logs of different datasets. The problem is not to find similar federated queries, but plausible complementary queries over different datasets. Starting with a real SPARQL query from a given log, PFed stretches the query with real queries from different logs. To prune the research space, PFed proposes semantic summary to prune the query logs. Experimental results with real logs of DBpedia and SWDF demonstrate that PFed is able to prune drastically the logs and recommend plausible federated queries

    Machine Learning-based Query Augmentation for SPARQL Endpoints

    Get PDF
    Linked Data repositories have become a popular source of publicly-available data. Users accessing this data through SPARQL endpoints usually launch several restrictive yet similar consecutive queries, either to find the information they need through trial-and-error or to query related resources. However, instead of executing each individual query separately, query augmentation aims at modifying the incoming queries to retrieve more data that is potentially relevant to subsequent requests. In this paper, we propose a novel approach to query augmentation for SPARQL endpoints based on machine learning. Our approach separates the structure of the query from its contents and measures two types of similarity, which are then used to predict the structure and contents of the augmented query. We test the approach on the real-world query logs of the Spanish and English DBpedia and show that our approach yields high-accuracy prediction. We also show that, by caching the results of the predicted (More)This work has been supported by the European Union's Horizon 2020 research and innovation program (grant H2020-MSCA-ITN-2014-642963), the Spanish Ministry of Science and Innovation (contract TIN2015-65316, project RTC-2016-4952-7 and contract TIN2016-78011-C4-4-R), the Spanish Ministry of Education, Culture and Sports (contract CAS18/00333) and the Generalitat de Catalunya (contract 2014-SGR-1051). The authors would also like to thank Toni Cortes for his feedback.Peer ReviewedPostprint (author's final draft

    Semantic data mining and linked data for a recommender system in the AEC industry

    Get PDF
    Even though it can provide design teams with valuable performance insights and enhance decision-making, monitored building data is rarely reused in an effective feedback loop from operation to design. Data mining allows users to obtain such insights from the large datasets generated throughout the building life cycle. Furthermore, semantic web technologies allow to formally represent the built environment and retrieve knowledge in response to domain-specific requirements. Both approaches have independently established themselves as powerful aids in decision-making. Combining them can enrich data mining processes with domain knowledge and facilitate knowledge discovery, representation and reuse. In this article, we look into the available data mining techniques and investigate to what extent they can be fused with semantic web technologies to provide recommendations to the end user in performance-oriented design. We demonstrate an initial implementation of a linked data-based system for generation of recommendations
    • …
    corecore