19 research outputs found

    Performance Test Automation with Distributed Database Systems

    Get PDF
    Our previous research paper 2018;A Focus on Testing Issues in Distributed Database Systems' led us to a conclusion that Distributed Database Systems supports many good engineering practices but there is still place for refinements. A Distributed Database (DDB) is formed by a collection of multiple databases logically inter-related in a Computer Network. Apart from managing a plethora of complicated tasks, database management systems also need to be efficient in terms of concurrency, reliability, fault-tolerance and performance. As there has been a paradigm shift from centralized databases to Distributed databases, any testing process, when used in DDB correlates a series of stages for the construction of a DDB project right from the scratch and is employed in homogeneous systems. In this paper, an attempt is made to describe the establishment of Performance Testing with DDB systems. It focuses on the need for maintaining performance and some techniques to achieve performance in DDB systems. Three sample web based systems are tested by using TestMaker, one of the open source software, in order to highlight the helpful role of performance in the context of testing. The strengths and weaknesses of chosen performance testing tools viz., TestMaker, OpenSTA, and httperf are discussed

    Compacting Frequent Star Patterns in RDF Graphs

    Get PDF
    Knowledge graphs have become a popular formalism for representing entities and their properties using a graph data model, e.g., the Resource Description Framework (RDF). An RDF graph comprises entities of the same type connected to objects or other entities using labeled edges annotated with properties. RDF graphs usually contain entities that share the same objects in a certain group of properties, i.e., they match star patterns composed of these properties and objects. In case the number of these entities or properties in these star patterns is large, the size of the RDF graph and query processing are negatively impacted; we refer these star patterns as frequent star patterns. We address the problem of identifying frequent star patterns in RDF graphs and devise the concept of factorized RDF graphs, which denote compact representations of RDF graphs where the number of frequent star patterns is minimized. We also develop computational methods to identify frequent star patterns and generate a factorized RDF graph, where compact RDF molecules replace frequent star patterns. A compact RDF molecule of a frequent star pattern denotes an RDF subgraph that instantiates the corresponding star pattern. Instead of having all the entities matching the original frequent star pattern, a surrogate entity is added and related to the properties of the frequent star pattern; it is linked to the entities that originally match the frequent star pattern. We evaluate the performance of our factorization techniques on several RDF graph benchmarks and compare with a baseline built on top of gSpan, a state-of-the-art algorithm to detect frequent patterns. The outcomes evidence the efficiency of proposed approach and show that our techniques are able to reduce execution time of the baseline approach in at least three orders of magnitude reducing the RDF graph size by up to 66.56%

    SPST-Index : a self pruning splay tree index for database cracking

    Get PDF
    Orientador : Prof. Dr. Eduardo Cunha de AlmeidaDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 24/02/2017Inclui referências : f. 41-43Área de concentração: Ciência da computaçãoResumo: Em Database Cracking, uma coluna de banco de dados se organiza fisicamente, de maneira autônoma, em partições, um índice é então criado para otimizar o acesso a essas partições. A árvore AVL é a estrutura de dados utilizada para implementar esse índice. Contudo, em termos de cache, ela é particularmente ineficiente para consultas de intervalos, já que seus nós acessados apenas algumas vezes e os nós frequentemente acessados estão espalhados por toda a árvore. Esse trabalho apresenta a Self-Pruning Splay Tree (SPST) que é uma estrutura de dados capaz de reorganizar os dados mais e menos acessados, melhorando o tempo de acesso para as partições mais acessadas. Para cada consulta de intervalo, a SPST rotaciona para a raiz os nós que apontam para os valores do predicado da consulta e o valor médio do intervalo. Eventualmente, os nós mais acessados da árvore irão permanecer próximos a raíz, melhorando a utilização da CPU e a atividade de cache. Os nós menos acessados permanecerão próximos às folhas e serão removidos para limparmos dados que não são utilizados, diminuindo o tamanho do índice e obtendo custos de leitura e atualização menores. Palavras-chave: Database Cracking, Índice para Cracking , Árvore Splay.Abstract: In database cracking, a database is physically self-organized into cracked partitions with cracker indices boosting the access to these partitions. The AVL Tree is the current data structure of choice to implement cracker indices. However, it is particularly cache-inefficient for range queries, because the nodes accessed only for a few times (i.e, "Cold Data") and the most accessed ones (i.e, "Hot Data") are spread all over the index. This work presents the Self-Pruning Splay Tree (SPST) data structure to index database cracking and reorganize "Hot Data" and "Cold Data" to boost the access to the cracked partitions. To every range query, the SPST rotates to the root the nodes pointing to the edges and to the middle value of the predicate interval. Eventually, the most accessed tree nodes remain close to the root improving CPU and cache activity. On the other hand, the least accessed tree nodes remain close to the leaves and are pruned to clean up unused data in order to diminish the storage footprint with significant improvements: smaller lookup/update costs. Keywords: Database Cracking, Cracker Index, Splay Tree

    Optimization of Regular Path Queries in Graph Databases

    Get PDF
    Regular path queries offer a powerful navigational mechanism in graph databases. Recently, there has been renewed interest in such queries in the context of the Semantic Web. The extension of SPARQL in version 1.1 with property paths offers a type of regular path query for RDF graph databases. While eminently useful, such queries are difficult to optimize and evaluate efficiently, however. We design and implement a cost-based optimizer we call Waveguide for SPARQL queries with property paths. Waveguide builds a query planwhich we call a waveplan (WP)which guides the query evaluation. There are numerous choices in the con- struction of a plan, and a number of optimization methods, so the space of plans for a query can be quite large. Execution costs of plans for the same query can vary by orders of magnitude with the best plan often offering excellent performance. A WPs costs can be estimated, which opens the way to cost-based optimization. We demonstrate that Waveguide properly subsumes existing techniques and that the new plans it adds are relevant. We analyze the effective plan space which is enabled by Waveguide and design an efficient enumerator for it. We implement a pro- totype of a Waveguide cost-based optimizer on top of an open-source relational RDF store. Finally, we perform a comprehensive performance study of the state of the art for evaluation of SPARQL property paths and demonstrate the significant performance gains that Waveguide offers

    Computing candidate keys of relational operators for optimizing rewrite-based provenance computation : key property module

    Get PDF
    Data provenance provides information about the origin of data, and has long attracted the attention of the database community. It has been proven to be essential for a wide range of use cases from debugging of data and queries to probabilistic databases. There exist different techniques for computing the data provenance of a query. However, even sophisticated database optimizers are usually incapable of producing an efficient execution plan for provenance computations because of their inherent complexity and unusual structure. In this work, I develop the key property module, as part of the heuristic optimization techniques for rewrite-based provenance systems to address this problem and present an implementation of this module in the GProM provenance middle-ware system. The key property stores the set of candidate keys for the output relation of a relational algebra operator. This property is important for evaluating the precondition of many heuristic rewrite rules applied by GProM, e.g., rules that reduce the number of duplicate removal operators in a query. To complete this work, I provide an experimental evaluation which confirms that this property is extremely useful for improving the performance at game provenance.La procedencia de datos proporciona información sobre el origen de los datos, y ha atraído mucho la atención de la comunidad de investigación en bases de datos. Se ha demostrado que es esencial para una amplia gama de casos, desde debugging de datos y consultas hasta bases de datos probabilísticos. Existen diferentes técnicas para el cálculo de la procedencia de datos de una consulta. Sin embargo, incluso los optimizadores de bases de datos sofisticados suelen ser incapaces de producir un plan de ejecución eficiente para cálculos de procedencia debido a su complejidad inherente y suestructura inusual. A lo largo de este trabajo, desarrollo el módulo para inferir la propiedad clave a los operadores, como parte de las técnicas de optimización heurística para sistemas de procedencia de datos basados en la reescritura para hacer frente al problema de optimización y presentar una implementación de este módulo en el sistema middleware de procedencia GProM. La propiedad clave almacena el conjunto de claves candidatas para la relación de salida de un operador de álgebra relacional. Esta propiedad es importante para evaluar la condición previa de muchas reglas de reescritura heurísticas aplicados por el sistema GProM, por ejemplo, las normas que reducen el número de operadores de eliminación de duplicados en una consulta. Para completar este trabajo, proporciono una evaluación experimental que confirma que esta propiedad es extremadamente útil para mejorar el rendimiento en el juego de procedencia.La procedència de dades proporciona informació sobre l’origen de les dades, i ha atret molt l’atenció de la comunitat de recerca en bases de dades. S’ha demostrat que és essencial per a una àmplia gamma de casos, des de debugging de dades i consultes fins a bases de dades probabilístiques. Existeixen diferents tècniques per al càlcul de la procedència de dades d’una consulta. No obstant això, fins i tot els optimitzadors de bases de dades sofisticats solen ser incapaços de produir un pla d’execució eficient per a càlculs de procedència a causa de la seva complexitat inherent i la seva estructura inusual. Al llarg d’aquest treball, desenvolupo un mòdul per inferir la propietat clau als operadors, com a part de les tècniques d’optimització heurística per a sistemes de procedència de dades basades en la reescriptura per fer front al problema d’optimització i presentar una implementació d’aquest mòdul en el sistema middleware de procedència GProM. La propietat clau emmagatzema el conjunt de claus candidates per a la relació de sortida d’un operador d’àlgebra relacional. Aquesta propietat és important per avaluar la condició prèvia de moltes regles de reescriptura heurístiques aplicats pel sistema GProM, per exemple, les normes que redueixen el nombre d’operadors d’eliminació de duplicats en una consulta. Per completar aquest projecte, proporciono una avaluació experimental que confirma que aquesta propietat és extremadament útil per millorar el rendiment en el joc de procedència

    Emergent relational schemas for RDF

    Get PDF

    Advancing Urban Mobility with Algorithm Engineering

    Get PDF
    corecore