8,970 research outputs found
Pushdown Compression
The pressing need for eficient compression schemes for XML documents has
recently been focused on stack computation [6, 9], and in particular calls for
a formulation of information-lossless stack or pushdown compressors that allows
a formal analysis of their performance and a more ambitious use of the stack in
XML compression, where so far it is mainly connected to parsing mechanisms. In
this paper we introduce the model of pushdown compressor, based on pushdown
transducers that compute a single injective function while keeping the widest
generality regarding stack computation. The celebrated Lempel-Ziv algorithm
LZ78 [10] was introduced as a general purpose compression algorithm that
outperforms finite-state compressors on all sequences. We compare the
performance of the Lempel-Ziv algorithm with that of the pushdown compressors,
or compression algorithms that can be implemented with a pushdown transducer.
This comparison is made without any a priori assumption on the data's source
and considering the asymptotic compression ratio for infinite sequences. We
prove that Lempel-Ziv is incomparable with pushdown compressors
On the performance of markup language compression
Data compression is used in our everyday life to improve computer interaction or simply for storage purposes. Lossless data compression refers to those techniques that are able to compress a file in such ways that the decompressed format is the replica of the original. These techniques, which differ from the lossy data compression, are necessary and heavily used in order to reduce resource usage and improve storage and transmission speeds. Prior research led to huge improvements in compression performance and efficiency for general purpose tools which are mainly based on statistical and dictionary encoding techniques.
Extensible Markup Language (XML) is based on redundant data which is parsed as normal text by general-purpose compressors. Several tools for compressing XML data have been developed, resulting in improvements for compression size and speed using different compression techniques. These tools are mostly based on algorithms that rely on variable length encoding. XML Schema is a language used to define the structure and data types of an XML document. As a result of this, it provides XML compression tools additional information that can be used to improve compression efficiency. In addition, XML Schema is also used for validating XML data. For document compression there is a need to generate the schema dynamically for each XML file. This solution can be applied to improve the efficiency of XML compressors.
This research investigates a dynamic approach to compress XML data using a hybrid compression tool. This model allows the compression of XML data using variable and fixed length encoding techniques when their best use cases are triggered. The aim of this research is to investigate the use of fixed length encoding techniques to support general-purpose XML compressors. The results demonstrate the possibility of improving on compression size when a fixed length encoder is used to compressed most XML data types
Compressed materialised views of semi-structured data
Query performance issues over semi-structured data have led to the emergence of materialised XML views as a means of restricting the data structure processed by a query. However preserving the conventional representation of such views remains a significant limiting factor especially in the context of mobile devices where processing power, memory usage and bandwidth are significant factors. To explore the concept of a compressed materialised view, we extend our earlier work on structural XML compression to produce a combination of structural summarisation and data compression techniques. These techniques provide a basis for efficiently dealing with both structural queries and valuebased predicates. We evaluate the effectiveness of such a scheme, presenting results and performance measures that show advantages of using such structures
Compression of Probabilistic XML documents
Probabilistic XML (PXML) files resulting from data integration can become extremely large, which is undesired. For XML there are several techniques available to compress the document and since probabilistic XML is in fact (a special form of) XML, it might benefit from these methods even more. In this research we search for compression mechanisms that are available for XML and implement one of them to customize it with respect to the properties of probabilistic XML. Experiments show that there is no significant improvement for combinations of traditional mechanisms with techniques that are specially designed for probabilistic XML
- …