77 research outputs found

    Self-organizing Structured RDF in MonetDB

    Get PDF
    The semantic web uses RDF as its data model, providing ultimate flexibility for users to represent and evolve data without need of a schema. Yet, this flexibility poses challenges in implementing efficient RDF stores, leading from plans with very many self-joins to a triple table, difficulties to optimize these, and a lack of data locality since without a notion of multi-attribute data structure, clustered indexing opportunities are lost. Apart from performance issues, users of huge RDF graphs often have problems formulating queries as they lack any system-supported notion of the structure in the data. In this research, we exploit the observation that real RDF data, while not as regularly structured as relational data, still has the great majority of triples conforming to regular patterns. We conjecture that a system that would recognize this structure automatically would both allow RDF stores to become more efficient and also easier to use. Concretely, we propose to derive self-organizing RDF that stores data in PSO format in such a way that the regular parts of the data physically correspond to relational columnar storage; and propose RDFscan/RDFjoin algorithms that compute star-patterns over these without wasting effort in self-joins. These regular parts, i.e. tables, are identified on ingestion by a schema discovery algorithm -- as such users will gain an SQL view of the regular part of the RDF data. This research aims to produce a state-of-the-art SPARQL frontend for MonetDB as a by-product, and we already present some preliminary results on this platform

    Advances in Large-Scale RDF Data Management

    Get PDF
    One of the prime goals of the LOD2 project is improving the performance and scalability of RDF storage solutions so that the increasing amount of Linked Open Data (LOD) can be efficiently managed. Virtuoso has been chosen as the basic RDF store for the LOD2 project, and during the project it has been significantly improved by incorporating advanced relational database techniques from MonetDB and Vectorwise, turning it into a compressed column store with vectored execution. This has reduced the performance gap (“RDF tax”) between Virtuoso’s SQL and SPARQL query performance in a way that still respects the “schema-last” nature of RDF. However, by lacking schema information, RDF database systems such as Virtuoso still cannot use advanced relational storage optimizations such as table partitioning or clustered indexes and have to execute SPARQL queries with many self-joins to a triple table, which leads to more join effort than needed in SQL systems. In this chapter, we first discuss the new column store techniques applied to Virtuoso, the enhancements in its cluster parallel version, and show its performance using the popular BSBM benchmark at the unsurpassed scale of 150 billion triples. We finally describe ongoing work in deriving an “emergent” relational schema from RDF data, which can help to close the performance gap between relational-based and RDF-based storage solutions

    Emergent relational schemas for RDF

    Get PDF

    Linked Open Data - Creating Knowledge Out of Interlinked Data: Results of the LOD2 Project

    Get PDF
    Database Management; Artificial Intelligence (incl. Robotics); Information Systems and Communication Servic

    Workload Matters: A Robust Approach to Physical RDF Database Design

    Get PDF
    Recent advances in Information Extraction, Linked Data Management and the Semantic Web have led to a rapid increase in both the volume and the variety of publicly available graph-structured data. As more and more businesses start to capitalize on graph-structured data, data management systems are being exposed to workloads that are far more diverse and dynamic than what they were designed to handle. In particular, most systems rely on a workload-oblivious physical layout with a fixed-schema and are adaptive only if the changes in the schema are minor. Thus, they are unable to perform consistently well across different types of workloads. This thesis introduces fundamental techniques for supporting diverse and dynamic workloads in RDF data management systems. Instead of assuming anything about the workload upfront, these techniques allow systems to adjust their physical designs as queries are executed. This includes changing the way (i) records are clustered in the storage system, (ii) data are organized and indexed, and (iii) queries are optimized, all at runtime. The thesis proceeds with a discussion of the challenges that have been encountered in implementing these ideas in a proof-of-concept prototype called chameleon-db, and it concludes with a thorough experimental evaluation

    Aplicação de MonetDB na avaliação de desempenho de bases de dados verticais

    Get PDF
    Dissertação apresentada à Universidade Fernando Pessoa como partes dos requisitos para a obtenção do grau de Mestre em Engenharia Informática, ramo de Sistemas de Informação e MultimédiaEsta dissertação analisa a aplicação do Sistema de Gestão de Bases de Dados MonetDB na avaliação do desempenho de bases de dados verticais, comparando com os sistemas PostgreSQL e CitusDB. Nos últimos anos, os sistemas de bases de dados verticais têm atraído muito interesse não só na comunidade científica como também nas comunidades empresarial e organizacional. Esse interesse está relacionado com o potencial de melhor desempenho, com a forma como as bases de dados são armazenadas, com a possibilidade de compressão dos dados e com o seu suporte no apoio à decisão nas organizações. O interesse crescente no uso de bases de dados por colunas em relação às bases de dados tradicionais, com armazenamento por linhas, deve-se essencialmente à forma de armazenamento e ao desempenho. Os sistemas de base de dados por linhas armazenam os registos de uma relação de forma sequencial, por página, enquanto os sistemas de bases de dados em coluna armazenam os valores pertencendo à mesma coluna de forma contínua, na mesma página, o que torna mais rápidas as operações de leitura de apenas um subconjunto das colunas de uma tabela. Nesta dissertação descrevem-se as principais características e vantagens do método de armazenamento por colunas em relação ao método de armazenamento por linhas, analisando sua arquitetura e os conceitos, e analisando as vantagens da compressão e das técnicas de materialização na execução de consultas. Essas vantagens mostram que a nível de execução de consultas típicas de aplicação analíticas, o desempenho das bases de dados por linhas é inferior ao das bases de dados por colunas coluna.This dissertation analyzes the application of MonetDB in a performance evaluation of vertical databases against traditional systems as PostgreSQL and CitusDB. In recent years, vertical database systems have attracted great interest both in the scientific community as well as in commercial areas. This interest is related to performance issues, to how the databases are stored, to the use of data compression and to their use in decision support queries. The growing interest in the use of vertical, or columnar, databases over traditional database storage lies mainly in the way data storage is made and to performance gains in some situations. The traditional database systems store tuples sequentially, by page, while vertical database systems store data belonging to the same column continuously, in the same page, which makes it faster to read a subset of a table. This dissertation describes the main characteristics and advantages of the vertical storage method in relation to the traditional storage method, analyzing its architecture and concepts, highlighting the compression advantages and materialization in the analysis of queries. These advantages show that the level of query execution performance of traditional databases, for analytical applications, is slower than the vertical databases

    Aspects of Data Warehouse Technologies for Complex Web Data

    Get PDF

    Efficiently indexing sparse wide tables in community systems

    Get PDF
    Master'sMASTER OF SCIENC
    corecore