22,138 research outputs found
Reordering Rows for Better Compression: Beyond the Lexicographic Order
Sorting database tables before compressing them improves the compression
rate. Can we do better than the lexicographical order? For minimizing the
number of runs in a run-length encoding compression scheme, the best approaches
to row-ordering are derived from traveling salesman heuristics, although there
is a significant trade-off between running time and compression. A new
heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades
off compression for a major running-time speedup, is a good option for very
large tables. However, for some compression schemes, it is more important to
generate long runs rather than few runs. For this case, another novel
heuristic, Vortex, is promising. We find that we can improve run-length
encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%:
these gains are on top of the gains due to lexicographically sorting the table.
We prove that the new row reordering is optimal (within 10%) at minimizing the
runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD
Emerging standards for still image compression: A software implementation and simulation study
The software implementation is described of an emerging standard for the lossy compression of continuous tone still images. This software program can be used to compress planetary images and other 2-D instrument data. It provides a high compression image coding capability that preserves image fidelity at compression rates competitive or superior to most known techniques. This software implementation confirms the usefulness of such data compression and allows its performance to be compared with other schemes used in deep space missions and for data based storage
Results from computational analysis of a mixed compression supersonic inlet
A numerical study was performed to simulate the critical flow through a supersonic inlet. This flow field has many phenomena such as shock waves, strong viscous effects, turbulent boundary layer development, boundary layer separations, and mass flow suction through the walls, (bleed). The computational tools used were two full Navier-Stokes (FNS) codes. The supersonic inlet that was analyzed is the Variable Diameter Centerbody, (VDC), inlet. This inlet is a candidate concept for the next generation supersonic involved effort in generating an efficient grid geometry and specifying boundary conditions, particularly in the bleed region and at the outflow boundary. Results for a critical inlet operation compare favorably to Method of Characteristics predictions and experimental data
On Optimally Partitioning Variable-Byte Codes
The ubiquitous Variable-Byte encoding is one of the fastest compressed
representation for integer sequences. However, its compression ratio is usually
not competitive with other more sophisticated encoders, especially when the
integers to be compressed are small that is the typical case for inverted
indexes. This paper shows that the compression ratio of Variable-Byte can be
improved by 2x by adopting a partitioned representation of the inverted lists.
This makes Variable-Byte surprisingly competitive in space with the best
bit-aligned encoders, hence disproving the folklore belief that Variable-Byte
is space-inefficient for inverted index compression. Despite the significant
space savings, we show that our optimization almost comes for free, given that:
we introduce an optimal partitioning algorithm that does not affect indexing
time because of its linear-time complexity; we show that the query processing
speed of Variable-Byte is preserved, with an extensive experimental analysis
and comparison with several other state-of-the-art encoders.Comment: Published in IEEE Transactions on Knowledge and Data Engineering
(TKDE), 15 April 201
Test Slice Difference Technique for Low-Transition Test Data Compression
[[notice]]補正完畢[[incitationindex]]EI[[booktype]]電子
- …