31,956 research outputs found
Better bitmap performance with Roaring bitmaps
Bitmap indexes are commonly used in databases and search engines. By
exploiting bit-level parallelism, they can significantly accelerate queries.
However, they can use much memory, and thus we might prefer compressed bitmap
indexes. Following Oracle's lead, bitmaps are often compressed using run-length
encoding (RLE). Building on prior work, we introduce the Roaring compressed
bitmap format: it uses packed arrays for compression instead of RLE. We compare
it to two high-performance RLE-based bitmap encoding techniques: WAH (Word
Aligned Hybrid compression scheme) and Concise (Compressed `n' Composable
Integer Set). On synthetic and real data, we find that Roaring bitmaps (1)
often compress significantly better (e.g., 2 times) and (2) are faster than the
compressed alternatives (up to 900 times faster for intersections). Our results
challenge the view that RLE-based bitmap compression is best
CONCISE: Compressed 'n' Composable Integer Set
Bit arrays, or bitmaps, are used to significantly speed up set operations in
several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Nevertheless, there is a space-time tradeoff among
compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades
some space to allow for bitwise operations without first decompressing bitmaps.
WAH has been recognized as the most efficient scheme in terms of computation
time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set),
a new scheme that enjoys significatively better performances than those of WAH.
In particular, when compared to WAH, our algorithm is able to reduce the
required memory up to 50%, by having similar or better performance in terms of
computation time. Further, we show that CONCISE can be efficiently used to
manipulate bitmaps representing sets of integral numbers in lieu of well-known
data structures such as arrays, lists, hashtables, and self-balancing binary
search trees. Extensive experiments over synthetic data show the effectiveness
of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page
Sparse signal and image recovery from Compressive Samples
In this paper we present an introduction to Compressive Sampling
(CS), an emerging model-based framework for data acquisition
and signal recovery based on the premise that a signal
having a sparse representation in one basis can be reconstructed
from a small number of measurements collected in a
second basis that is incoherent with the first. Interestingly, a
random noise-like basis will suffice for the measurement process.
We will overview the basic CS theory, discuss efficient
methods for signal reconstruction, and highlight applications
in medical imaging
Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction
Discovering relevant, but possibly hidden, variables is a key step in
constructing useful and predictive theories about the natural world. This brief
note explains the connections between three approaches to this problem: the
recently introduced information-bottleneck method, the computational mechanics
approach to inferring optimal models, and Salmon's statistical relevance basis.Comment: 3 pages, no figures, submitted to PRE as a "brief report". Revision:
added an acknowledgements section originally omitted by a LaTeX bu
- …