12,648 research outputs found
On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark
Querying very large RDF data sets in an efficient manner requires a
sophisticated distribution strategy. Several innovative solutions have recently
been proposed for optimizing data distribution with predefined query workloads.
This paper presents an in-depth analysis and experimental comparison of five
representative and complementary distribution approaches. For achieving fair
experimental results, we are using Apache Spark as a common parallel computing
framework by rewriting the concerned algorithms using the Spark API. Spark
provides guarantees in terms of fault tolerance, high availability and
scalability which are essential in such systems. Our different implementations
aim to highlight the fundamental implementation-independent characteristics of
each approach in terms of data preparation, load balancing, data replication
and to some extent to query answering cost and performance. The presented
measures are obtained by testing each system on one synthetic and one
real-world data set over query workloads with differing characteristics and
different partitioning constraints.Comment: 16 pages, 3 figure
Icarus: Towards a Multistore Database System
The last years have seen a vast diversification on the database market. In contrast to the "one-size-fits-all" paradigm according to which systems have been designed in the past, today's database management systems (DBMSs) are tuned for particular workloads. This has led to DBMSs optimized for high performance, high throughput read/write workload in online transaction processing (OLTP) and systems optimized for complex analytical queries (OLAP). However, this approach reaches a limit when systems have to deal with mixed workloads that are neither pure OLAP nor pure OLTP workloads. In such cases, polystores are increasingly gaining popularity. Rather than supporting one single database paradigm and addressing one particular workload, polystores encompass several DBMSs that store data in different schemas and allow to route requests at a per-query-level to the most appropriate system. In this paper, we introduce the polystore Icarus. In our evaluation based on a workload that combines OLTP and OLAP elements, We show that Icarus is able to speed-up queries up to a factor of 3 by properly routing queries to the best underlying DBMS
ArchiVISTA: A New Horizon in Providing Access to Visual Records of the National Archives of Canada
published or submitted for publicatio
- …