Search CORE

3 research outputs found

Indexing relations on the web

Author: Freire Juliana
Mergen Sergio Luis Sardi
Publication venue: EDBT
Publication date: 01/01/2010
Field of study

Journal ArticleThere has been a substantial increase in the volume of (semi) structured data on the Web. This opens new opportunities for exploring and querying these data that goes beyond the keyword-based queries traditionally used on the Web. But supporting queries over a very large number of apparently disconnected Web sources is challenging. In this paper we propose index methods that capture both the structure of the sources and connections between them. The indexes are designed for data that is represented as relations, such as HTML tables, and support queries with predicates. We show how associations between overlapping sources are discovered, captured in the indexes, and used to derive query rewritings that join multiple sources. We demonstrate, through an experimental evaluation

The University of Utah: J. Willard Marriott Digital Library

Compression of Textual Column-Oriented Data

Author: Garcia Vinicius Fulber
Mergen Sergio Luis Sardi
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 03/07/2018
Field of study

Column-oriented data are well suited for compression. Since values of the same column are stored contiguously on disk, the information entropy is lower if compared to the physical data organization of conventional databases. There are many useful light-weight compression techniques targeted at specific data types and domains, like integers and small lists of distinct values, respectively. However, compression of textual values formed by skewed and high-cardinality words is usually restricted to variations of the LZ compression algorithm. So far there are no empirical evaluations that verify how other sophisticated compression methods address columnar data that store text. In this paper we shed a light on this subject by revisiting concepts of those algorithms. We also analyse how they behave in terms of compression and speed when dealing with textual columns where values appear in adjacent positions

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Compression of Very Sparse Column Oriented Data

Author: Garcia Vinicius Fulber
Mergen Sergio Luis Sardi
Publication venue: 'Universidad Federal de Santa Maria'
Publication date: 11/10/2016
Field of study

Column oriented databases store columns contiguously on disk. The adjacency of values from the same domain leads to a reduced information entropy. Consequently, compression algorithms are able to achieve better results. Columns whose values have a high cardinality are usually compressed using variations of the LZ method. In this paper, we consider the usage of simpler methods based on run-length and symbols probability in scenarios where datasets are very sparse. Our experiments show in which cases the simple methods evaluated provide promising results

Universidade Federal de Santa Maria: Portal de Periódicos Eletrônicos da UFSM