Search CORE

8,970 research outputs found

Pushdown Compression

Author: Albert Pilar
Mayordomo Elvira
Moser Philippe
Perifel Sylvain
Publication venue
Publication date: 17/09/2007
Field of study

The pressing need for eficient compression schemes for XML documents has recently been focused on stack computation [6, 9], and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal analysis of their performance and a more ambitious use of the stack in XML compression, where so far it is mainly connected to parsing mechanisms. In this paper we introduce the model of pushdown compressor, based on pushdown transducers that compute a single injective function while keeping the widest generality regarding stack computation. The celebrated Lempel-Ziv algorithm LZ78 [10] was introduced as a general purpose compression algorithm that outperforms finite-state compressors on all sequences. We compare the performance of the Lempel-Ziv algorithm with that of the pushdown compressors, or compression algorithms that can be implemented with a pushdown transducer. This comparison is made without any a priori assumption on the data's source and considering the asymptotic compression ratio for infinite sequences. We prove that Lempel-Ziv is incomparable with pushdown compressors

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Dagstuhl Research Online Publication Server

Hal-Diderot

On the performance of markup language compression

Author: Kheirkhahzadeh Antonio
Publication venue
Publication date
Field of study

Data compression is used in our everyday life to improve computer interaction or simply for storage purposes. Lossless data compression refers to those techniques that are able to compress a file in such ways that the decompressed format is the replica of the original. These techniques, which differ from the lossy data compression, are necessary and heavily used in order to reduce resource usage and improve storage and transmission speeds. Prior research led to huge improvements in compression performance and efficiency for general purpose tools which are mainly based on statistical and dictionary encoding techniques. Extensible Markup Language (XML) is based on redundant data which is parsed as normal text by general-purpose compressors. Several tools for compressing XML data have been developed, resulting in improvements for compression size and speed using different compression techniques. These tools are mostly based on algorithms that rely on variable length encoding. XML Schema is a language used to define the structure and data types of an XML document. As a result of this, it provides XML compression tools additional information that can be used to improve compression efficiency. In addition, XML Schema is also used for validating XML data. For document compression there is a need to generate the schema dynamically for each XML file. This solution can be applied to improve the efficiency of XML compressors. This research investigates a dynamic approach to compress XML data using a hybrid compression tool. This model allows the compression of XML data using variable and fixed length encoding techniques when their best use cases are triggered. The aim of this research is to investigate the use of fixed length encoding techniques to support general-purpose XML compressors. The results demonstrate the possibility of improving on compression size when a fixed length encoder is used to compressed most XML data types

UWL Repository

Compressed materialised views of semi-structured data

Author: Gourlay Richard
Tripney Brian
Wilson John
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures

Enlighten

Compression of Probabilistic XML documents

Author: Veldman Irma
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

Probabilistic XML (PXML) files resulting from data integration can become extremely large, which is undesired. For XML there are several techniques available to compress the document and since probabilistic XML is in fact (a special form of) XML, it might benefit from these methods even more. In this research we search for compression mechanisms that are available for XML and implement one of them to customize it with respect to the properties of probabilistic XML. Experiments show that there is no significant improvement for combinations of traditional mechanisms with techniques that are specially designed for probabilistic XML

University of Twente Research Information