12,020 research outputs found
Real-time and distributed applications for dictionary-based data compression
The greedy approach to dictionary-based static text compression can be executed by a finite state machine.
When it is applied in parallel to different blocks of data independently, there is no lack of robustness
even on standard large scale distributed systems with input files of arbitrary size. Beyond standard large
scale, a negative effect on the compression effectiveness is caused by the very small size of the data blocks.
A robust approach for extreme distributed systems is presented in this paper, where this problem is fixed by
overlapping adjacent blocks and preprocessing the neighborhoods of the boundaries.
Moreover, we introduce the notion of pseudo-prefix dictionary, which allows optimal compression by means
of a real-time semi-greedy procedure and a slight improvement on the compression ratio obtained by the
distributed implementations
Improving Table Compression with Combinatorial Optimization
We study the problem of compressing massive tables within the
partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which
a table is partitioned by an off-line training procedure into disjoint
intervals of columns, each of which is compressed separately by a standard,
on-line compressor like gzip. We provide a new theory that unifies previous
experimental observations on partitioning and heuristic observations on column
permutation, all of which are used to improve compression rates. Based on the
theory, we devise the first on-line training algorithms for table compression,
which can be applied to individual files, not just continuously operating
sources; and also a new, off-line training algorithm, based on a link to the
asymmetric traveling salesman problem, which improves on prior work by
rearranging columns prior to partitioning. We demonstrate these results
experimentally. On various test files, the on-line algorithms provide 35-55%
improvement over gzip with negligible slowdown; the off-line reordering
provides up to 20% further improvement over partitioning alone. We also show
that a variation of the table compression problem is MAX-SNP hard.Comment: 22 pages, 2 figures, 5 tables, 23 references. Extended abstract
appears in Proc. 13th ACM-SIAM SODA, pp. 213-222, 200
PRESS: A Novel Framework of Trajectory Compression in Road Networks
Location data becomes more and more important. In this paper, we focus on the
trajectory data, and propose a new framework, namely PRESS (Paralleled
Road-Network-Based Trajectory Compression), to effectively compress trajectory
data under road network constraints. Different from existing work, PRESS
proposes a novel representation for trajectories to separate the spatial
representation of a trajectory from the temporal representation, and proposes a
Hybrid Spatial Compression (HSC) algorithm and error Bounded Temporal
Compression (BTC) algorithm to compress the spatial and temporal information of
trajectories respectively. PRESS also supports common spatial-temporal queries
without fully decompressing the data. Through an extensive experimental study
on real trajectory dataset, PRESS significantly outperforms existing approaches
in terms of saving storage cost of trajectory data with bounded errors.Comment: 27 pages, 17 figure
- …