586 research outputs found
View Selection in Semantic Web Databases
We consider the setting of a Semantic Web database, containing both explicit
data encoded in RDF triples, and implicit data, implied by the RDF semantics.
Based on a query workload, we address the problem of selecting a set of views
to be materialized in the database, minimizing a combination of query
processing, view storage, and view maintenance costs. Starting from an existing
relational view selection method, we devise new algorithms for recommending
view sets, and show that they scale significantly beyond the existing
relational ones when adapted to the RDF context. To account for implicit
triples in query answers, we propose a novel RDF query reformulation algorithm
and an innovative way of incorporating it into view selection in order to avoid
a combinatorial explosion in the complexity of the selection process. The
interest of our techniques is demonstrated through a set of experiments.Comment: VLDB201
RDFViewS: A Storage Tuning Wizard for RDF Applications
In recent years, the significant growth of RDF data used in numerous
applications has made its efficient and scalable manipulation an important
issue. In this paper, we present RDFViewS, a system capable of choosing the
most suitable views to materialize, in order to minimize the query response
time for a specific SPARQL query workload, while taking into account the view
maintenance cost and storage space constraints. Our system employs practical
algorithms and heuristics to navigate through the search space of potential
view configurations, and exploits the possibly available semantic information -
expressed via an RDF Schema - to ensure the completeness of the query
evaluation
SOFOS: Demonstrating the Challenges of Materialized View Selection on Knowledge Graphs
Analytical queries over RDF data are becoming prominent as a result of the
proliferation of knowledge graphs. Yet, RDF databases are not optimized to
perform such queries efficiently, leading to long processing times. A well
known technique to improve the performance of analytical queries is to exploit
materialized views. Although popular in relational databases, view
materialization for RDF and SPARQL has not yet transitioned into practice, due
to the non-trivial application to the RDF graph model. Motivated by a lack of
understanding of the impact of view materialization alternatives for RDF data,
we demonstrate SOFOS, a system that implements and compares several cost models
for view materialization. SOFOS is, to the best of our knowledge, the first
attempt to adapt cost models, initially studied in relational data, to the
generic RDF setting, and to propose new ones, analyzing their pitfalls and
merits. SOFOS takes an RDF dataset and an analytical query for some facet in
the data, and compares and evaluates alternative cost models, displaying
statistics and insights about time, memory consumption, and query
characteristics
A unified view of data-intensive flows in business intelligence systems : a survey
Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft
Old Techniques for New Join Algorithms: A Case Study in RDF Processing
Recently there has been significant interest around designing specialized RDF
engines, as traditional query processing mechanisms incur orders of magnitude
performance gaps on many RDF workloads. At the same time researchers have
released new worst-case optimal join algorithms which can be asymptotically
better than the join algorithms in traditional engines. In this paper we apply
worst-case optimal join algorithms to a standard RDF workload, the LUBM
benchmark, for the first time. We do so using two worst-case optimal engines:
(1) LogicBlox, a commercial database engine, and (2) EmptyHeaded, our prototype
research engine with enhanced worst-case optimal join algorithms. We show that
without any added optimizations both LogicBlox and EmptyHeaded outperform two
state-of-the-art specialized RDF engines, RDF-3X and TripleBit, by up to 6x on
cyclic join queries-the queries where traditional optimizers are suboptimal. On
the remaining, less complex queries in the LUBM benchmark, we show that three
classic query optimization techniques enable EmptyHeaded to compete with RDF
engines, even when there is no asymptotic advantage to the worst-case optimal
approach. We validate that our design has merit as EmptyHeaded outperforms
MonetDB by three orders of magnitude and LogicBlox by two orders of magnitude,
while remaining within an order of magnitude of RDF-3X and TripleBit
- …