1,829 research outputs found
How Many and What Types of SPARQL Queries can be Answered through Zero-Knowledge Link Traversal?
The current de-facto way to query the Web of Data is through the SPARQL
protocol, where a client sends queries to a server through a SPARQL endpoint.
Contrary to an HTTP server, providing and maintaining a robust and reliable
endpoint requires a significant effort that not all publishers are willing or
able to make. An alternative query evaluation method is through link traversal,
where a query is answered by dereferencing online web resources (URIs) at real
time. While several approaches for such a lookup-based query evaluation method
have been proposed, there exists no analysis of the types (patterns) of queries
that can be directly answered on the live Web, without accessing local or
remote endpoints and without a-priori knowledge of available data sources. In
this paper, we first provide a method for checking if a SPARQL query (to be
evaluated on a SPARQL endpoint) can be answered through zero-knowledge link
traversal (without accessing the endpoint), and analyse a large corpus of real
SPARQL query logs for finding the frequency and distribution of answerable and
non-answerable query patterns. Subsequently, we provide an algorithm for
transforming answerable queries to SPARQL-LD queries that bypass the endpoints.
We report experimental results about the efficiency of the transformed queries
and discuss the benefits and the limitations of this query evaluation method.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP
Symposium On Applied Computing (SAC 2019
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
A Prospective Analysis of Security Vulnerabilities within Link Traversal-Based Query Processing (Extended Version)
The societal and economical consequences surrounding Big Data-driven
platforms have increased the call for decentralized solutions. However,
retrieving and querying data in more decentralized environments requires
fundamentally different approaches, whose properties are not yet well
understood. Link Traversal-based Query Processing (LTQP) is a technique for
querying over decentralized data networks, in which a client-side query engine
discovers data by traversing links between documents. Since decentralized
environments are potentially unsafe due to their non-centrally controlled
nature, there is a need for client-side LTQP query engines to be resistant
against security threats aimed at the query engine's host machine or the query
initiator's personal data. As such, we have performed an analysis of potential
security vulnerabilities of LTQP. This article provides an overview of security
threats in related domains, which are used as inspiration for the
identification of 10 LTQP security threats. Each threat is explained, together
with an example, and one or more avenues for mitigations are proposed. We
conclude with several concrete recommendations for LTQP query engine developers
and data publishers as a first step to mitigate some of these issues. With this
work, we start filling the unknowns for enabling querying over decentralized
environments. Aside from future work on security, wider research is needed to
uncover missing building blocks for enabling true decentralization.Comment: This is an extended version of an article with the same title
published in the proceedings of the QuWeDa workshop at ISWC 2022. Next to
more details in the related work and conclusions sections, this extension
introduces concrete mitigations of each vulnerabilit
Evaluation of Link Traversal Query Execution over Decentralized Environments with Structural Assumptions
To counter societal and economic problems caused by data silos on the Web,
efforts such as Solid strive to reclaim private data by storing it in
permissioned documents over a large number of personal vaults across the Web.
Building applications on top of such a decentralized Knowledge Graph involves
significant technical challenges: centralized aggregation prior to query
processing is excluded for legal reasons, and current federated querying
techniques cannot handle this large scale of distribution at the expected
performance. We propose an extension to Link Traversal Query Processing (LTQP)
that incorporates structural properties within decentralized environments to
tackle their unprecedented scale. In this article, we analyze the structural
properties of the Solid decentralization ecosystem that are relevant for query
execution, and provide the SolidBench benchmark to simulate Solid environments
representatively. We introduce novel LTQP algorithms leveraging these
structural properties, and evaluate their effectiveness. Our experiments
indicate that these new algorithms obtain accurate results in the order of
seconds for non-complex queries, which existing algorithms cannot achieve.
Furthermore, we discuss limitations with respect to more complex queries. This
work reveals that a traversal-based querying method using structural
assumptions can be effective for large-scale decentralization, but that
advances are needed in the area of query planning for LTQP to handle more
complex queries. These insights open the door to query-driven decentralized
applications, in which declarative queries shield developers from the inherent
complexity of a decentralized landscape.Comment: Not peer-reviewe
RDF Querying
Reactive Web systems, Web services, and Web-based publish/
subscribe systems communicate events as XML messages, and in
many cases require composite event detection: it is not sufficient to react
to single event messages, but events have to be considered in relation to
other events that are received over time.
Emphasizing language design and formal semantics, we describe the
rule-based query language XChangeEQ for detecting composite events.
XChangeEQ is designed to completely cover and integrate the four complementary
querying dimensions: event data, event composition, temporal
relationships, and event accumulation. Semantics are provided as
model and fixpoint theories; while this is an established approach for rule
languages, it has not been applied for event queries before
Linked Data - the story so far
The term “Linked Data” refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions— the Web of Data. In this article, the authors present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. They describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward
Survey over Existing Query and Transformation Languages
A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability
of many current Semantic Web approaches to cope with data available in such diverging
representation formalisms as XML, RDF, or Topic Maps. A common query language is the first
step to allow transparent access to data in any of these formats. To further the understanding
of the requirements and approaches proposed for query languages in the conventional as well
as the Semantic Web, this report surveys a large number of query languages for accessing
XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from
all these areas. From the detailed survey of these query languages, a common classification
scheme is derived that is useful for understanding and differentiating languages within and
among all three areas
WAQS : a web-based approximate query system
The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval.
In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language.
Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation
- …