7 research outputs found

    On the number of string lookups in BSTs (and related algorithms) with digital access

    Get PDF
    Binary search trees and quicksort are examples of comparison-based data structure and algorithm respectively. Comparison-based data structures and algorithms can be can be augmented so that no redundant character comparisons are made. Unnoticed, this approach also avoids looking up the string in some nodes. This paper haracterizes analytically the number of string lookups in so-augmented BSTs, quicksort and quickselect. Besides, we also characterize a variant proposed in this paper to reduce further the number of string lookups.Postprint (published version

    Burrows-wheeler transform in secondary memory

    Get PDF
    Master’s Thesis in Computer EngineeringA suffix array is an index, a data structure that allows searching for sequences of characters. Such structures are of key importance for a large set of problems related to sequences of characters. An especially important use of suffix arrays is to compute the Burrows-Wheeler Transform, which can be used for compressing text. This procedure is the base of the UNIX utility bzip2. The Burrows-Wheeler transform is a key step in the construction of more sophisticated indexes. For large sequences of characters, such as DNA sequences of about 10 GB, it is not possible to calculate the Burrows-Wheeler transform in an average computer without using secondary memory. In this dissertation we will study the state-of-the-art algorithms to construct the Burrows-Wheeler transform in secondary memory. Based on this research we propose an algorithm and compare it against the previous ones to determine its relative performance. Our algorithm is based on the classical external Heapsort. The novelty lies in a heap that is especially designed for suffix arrays, which we call String Heap. This algorithm aims to be space-conscious, while trying to handle the disk access dominance over main memory access. We divide our solution in two parts, splitting and merging suffix arrays, the latter is the main application of the String Heap. The merging part produces the BWT, as a side effect of merging a set of partial suffix arrays of a text. We also compare its performance against the other algorithms. We also study a second version of the algorithm that accesses secondary memory in blocks

    Cracking KD-Tree : the first multidimensional adaptive indexing

    Get PDF
    Orientador: Prof. Dr. Eduardo Cunha de AlmeidaDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 03/10/2018Inclui referências: p. 48-51Área de concentração: Ciência da ComputaçãoResumo: A criação de índices é um das decisões mais difíceis no processo de criação de esquemas em bancos de dados. Dada uma carga de trabalho, o administrador do banco de dados precisa decidir quais índices criar levando em consideração os custos para construção e manutenção deles. Esse problema se torna ainda mais difícil quando é necessário lidar buscas em múltiplas dimensões em sistemas exploratórios, onde não se tem uma carga de trabalho disponível e o número de possíveis índices é ainda maior. Técnicas de indexação adaptativas, como Sideways Cracking e Quasii, são capazes de responder buscas de intervalo em múltiplas dimensões. Nessa dissertação nós propomos uma alternativa, a Cracking KD-Tree, que é uma estrutura de dados adaptativa usada para buscas em múltiplas dimensões. Comparando-a com outras técnicas adaptativas de indexação, nossa estrutura de dados teve eficiência melhor ou comparável, com respeito a tempo total de resposta para executar a carga de trabalho. Com 2 atributos nós fomos 6.7x mais rápidos que o Sideways Cracking e 1.4x que o Quasii. Com 16 atributos, a Cracking KD-Tree foi 19x mais rápida que o Sideways Cracking e 1.7x mais rápida que o Quasii. Palavras-chave: Particionamento de Banco de Dados. Índice Multidimensional. Banco de Dados.Abstract: Index creation is one of the main difficult decisions in database schema design. Given a workload, the database administrator has to decide which indexes to create taking into consideration the costs to build and maintain them. This problem becomes even more difficult when dealing with multidimensional queries in exploratory systems, where there is no workload available and the number of possible indexes is bigger. State of the art adaptive indexing techniques, such as Sideways Cracking and Quasii, are capable of answering multidimensional range queries. In this dissertation we propose an alternative, the Cracking KD-Tree, which is an adaptive data structure used for multidimensional queries. Comparing it with other adaptive indexing techniques, our data structure had more or comparable efficiency with respect to total workload response time. With 2 attributes we were 6.7x faster than Sideways Cracking and 1.4x than Quasii. With 16 attributes, the Cracking KD-Tree was 19x faster than Sideways Cracking and 1.7x faster than Quasii. Keywords: Database Cracking. Multidimensional Index. Database Systems

    The Quicksort algorithm and related topics

    Get PDF
    Sorting algorithms have attracted a great deal of attention and study, as they have numerous applications to Mathematics, Computer Science and related fields. In this thesis, we first deal with the mathematical analysis of the Quicksort algorithm and its variants. Specifically, we study the time complexity of the algorithm and we provide a complete demonstration of the variance of the number of comparisons required, a known result but one whose detailed proof is not easy to read out of the literature. We also examine variants of Quicksort, where multiple pivots are chosen for the partitioning of the array. The rest of this work is dedicated to the analysis of finding the true order by further pairwise comparisons when a partial order compatible with the true order is given in advance. We discuss a number of cases where the partially ordered sets arise at random. To this end, we employ results from Graph and Information Theory. Finally, we obtain an alternative bound on the number of linear extensions when the partially ordered set arises from a random graph, and discuss the possible application of Shellsort in merging chains

    Базові алгоритми та структури даних

    Get PDF
    Вирішення прикладних задач в області інформаційних технологій потребує адаптації інформації із загального опису на математичну основу, потім як алгоритм, і далі - на мову програмування, або навпаки, в зворотному порядку. Знання способів побудови ефективних алгоритмів, використання структур даних знадобляться всім, хто буде ефективно аналізувати інформацію та створювати конкурентні програмні продукти. Поряд з розглядом теоретичних питань, навчальний посібник містить теорію та лабораторні завдання, які допоможуть зрозуміти основні поняття алгоритмів та практичні принципи структур даних. Навчальний посібник також призначений для студентів та для спеціалістів в області інформаційних технологій, які вивчають алгоритми та структури даних самостійно.Solving applied problems in the field of information technology requires the adaptation of information from the general description on a mathematical basis, then as the algorithms, and then - to a specific programming language or vice versa, in reverse order. Knowledge of how to build effective algorithms and the use of data structure will be needed by all who will effectively analyze information and create competitive software products. Along with the considered theoretical manual theory and practical tasks that describe the basic concepts of algorithms and principles of structure. The textbook is useful for students and information technology professionals
    corecore