5 research outputs found

    Implementation and experimental evaluation of flexible parsing for dynamic dictionary based data compression

    Get PDF
    We report on the implementation and performance evaluation of greedy parsing with lookaheads for dynamic dictionary compression. Specifically, we consider the greedy parsing with a single step lookahead which we call Flexible Parsing (FP) as an alternative to the commonly used greedy parsing (with no-lookaheads) scheme. Greedy parsing is the basis of most popular compression programs including unix compress and gzip, however it does not necessarily achieve optimality with regard to the dictionary construction scheme in use. Flexible parsing, however, is optimal, i.e., partitions any given input to the smallest number of phrases possible, for dictionary construction schemes which satisfy the prefix property throughout their execution. There is an on-line linear time and space implementation of the FP scheme via the trie-reverse-trie pair data structure [MS98]. In this paper, we introduce a more practical, randomized data structure to implement FP scheme whose expected theoretical performance matches the worst case performance of the trie-reverse-trie-pair. We then report on the compression ratios achieved by two FP based compression programs we implemented. We test our programs against compress and gzip on various types of data on some of which we obtain up to 35% improvement

    Aplicação de MonetDB na avaliação de desempenho de bases de dados verticais

    Get PDF
    Dissertação apresentada à Universidade Fernando Pessoa como partes dos requisitos para a obtenção do grau de Mestre em Engenharia Informática, ramo de Sistemas de Informação e MultimédiaEsta dissertação analisa a aplicação do Sistema de Gestão de Bases de Dados MonetDB na avaliação do desempenho de bases de dados verticais, comparando com os sistemas PostgreSQL e CitusDB. Nos últimos anos, os sistemas de bases de dados verticais têm atraído muito interesse não só na comunidade científica como também nas comunidades empresarial e organizacional. Esse interesse está relacionado com o potencial de melhor desempenho, com a forma como as bases de dados são armazenadas, com a possibilidade de compressão dos dados e com o seu suporte no apoio à decisão nas organizações. O interesse crescente no uso de bases de dados por colunas em relação às bases de dados tradicionais, com armazenamento por linhas, deve-se essencialmente à forma de armazenamento e ao desempenho. Os sistemas de base de dados por linhas armazenam os registos de uma relação de forma sequencial, por página, enquanto os sistemas de bases de dados em coluna armazenam os valores pertencendo à mesma coluna de forma contínua, na mesma página, o que torna mais rápidas as operações de leitura de apenas um subconjunto das colunas de uma tabela. Nesta dissertação descrevem-se as principais características e vantagens do método de armazenamento por colunas em relação ao método de armazenamento por linhas, analisando sua arquitetura e os conceitos, e analisando as vantagens da compressão e das técnicas de materialização na execução de consultas. Essas vantagens mostram que a nível de execução de consultas típicas de aplicação analíticas, o desempenho das bases de dados por linhas é inferior ao das bases de dados por colunas coluna.This dissertation analyzes the application of MonetDB in a performance evaluation of vertical databases against traditional systems as PostgreSQL and CitusDB. In recent years, vertical database systems have attracted great interest both in the scientific community as well as in commercial areas. This interest is related to performance issues, to how the databases are stored, to the use of data compression and to their use in decision support queries. The growing interest in the use of vertical, or columnar, databases over traditional database storage lies mainly in the way data storage is made and to performance gains in some situations. The traditional database systems store tuples sequentially, by page, while vertical database systems store data belonging to the same column continuously, in the same page, which makes it faster to read a subset of a table. This dissertation describes the main characteristics and advantages of the vertical storage method in relation to the traditional storage method, analyzing its architecture and concepts, highlighting the compression advantages and materialization in the analysis of queries. These advantages show that the level of query execution performance of traditional databases, for analytical applications, is slower than the vertical databases

    Sahinalp Implementation and experimental evaluation of flexible parsing for dynamic dictionary based data compression. Workshop on Algorithmic Engineering

    No full text
    We report on the implementation and performance evaluation of greedy parsing with lookaheads for dynamic dictionary compression. Speci cally, we consider the greedy parsing with a single step lookahead which we call Flexible Parsing (FP) as an alternative to the commonly used greedy parsing (with no-lookaheads) scheme. Greedy parsing is the basis of most popular compression programs including unix compress and gzip, however it does not necessarily achieve optimality with regard to the dictionary construction scheme in use. Flexible parsing, however, is optimal, i.e., partitions any given input to the smallest number of phrases possible, for dictionary construction schemes which satisfy the pre x property throughout their execution. There is an on-line linear time and space implementation of the FP scheme via the trie-reverse-trie pair data structure [MS98]. In this paper, we introduce a more practical, randomized data structure to implement FP scheme whose expected theoretical performance matches the worst case performance of the trie-reverse-trie-pair. We then report on the compression ratios achieved by two FP based compression programs we implemented. We test our programs against compress and gzip on various types of data on some of which we obtain up to 35 % improvement. 1
    corecore