2,311 research outputs found
Strongly universal string hashing is fast
We present fast strongly universal string hashing families: they can process
data at a rate of 0.2 CPU cycle per byte. Maybe surprisingly, we find that
these families---though they require a large buffer of random numbers---are
often faster than popular hash functions with weaker theoretical guarantees.
Moreover, conventional wisdom is that hash functions with fewer multiplications
are faster. Yet we find that they may fail to be faster due to operation
pipelining. We present experimental results on several processors including
low-powered processors. Our tests include hash functions designed for
processors with the Carry-Less Multiplication (CLMUL) instruction set. We also
prove, using accessible proofs, the strong universality of our families.Comment: Software is available at
http://code.google.com/p/variablelengthstringhashing/ and
https://github.com/lemire/StronglyUniversalStringHashin
Stream VByte: Faster Byte-Oriented Integer Compression
Arrays of integers are often compressed in search engines. Though there are
many ways to compress integers, we are interested in the popular byte-oriented
integer compression techniques (e.g., VByte or Google's Varint-GB). They are
appealing due to their simplicity and engineering convenience. Amazon's
varint-G8IU is one of the fastest byte-oriented compression technique published
so far. It makes judicious use of the powerful single-instruction-multiple-data
(SIMD) instructions available in commodity processors. To surpass varint-G8IU,
we present Stream VByte, a novel byte-oriented compression technique that
separates the control stream from the encoded data. Like varint-G8IU, Stream
VByte is well suited for SIMD instructions. We show that Stream VByte decoding
can be up to twice as fast as varint-G8IU decoding over real data sets. In this
sense, Stream VByte establishes new speed records for byte-oriented integer
compression, at times exceeding the speed of the memcpy function. On a 3.4GHz
Haswell processor, it decodes more than 4 billion differentially-coded integers
per second from RAM to L1 cache
Streaming Maximum-Minimum Filter Using No More than Three Comparisons per Element
The running maximum-minimum (max-min) filter computes the maxima and minima
over running windows of size w. This filter has numerous applications in signal
processing and time series analysis. We present an easy-to-implement online
algorithm requiring no more than 3 comparisons per element, in the worst case.
Comparatively, no algorithm is known to compute the running maximum (or
minimum) filter in 1.5 comparisons per element, in the worst case. Our
algorithm has reduced latency and memory usage.Comment: to appear in Nordic Journal of Computin
An Optimal Linear Time Algorithm for Quasi-Monotonic Segmentation
Monotonicity is a simple yet significant qualitative characteristic. We
consider the problem of segmenting a sequence in up to K segments. We want
segments to be as monotonic as possible and to alternate signs. We propose a
quality metric for this problem using the l_inf norm, and we present an optimal
linear time algorithm based on novel formalism. Moreover, given a
precomputation in time O(n log n) consisting of a labeling of all extrema, we
compute any optimal segmentation in constant time. We compare experimentally
its performance to two piecewise linear segmentation heuristics (top-down and
bottom-up). We show that our algorithm is faster and more accurate.
Applications include pattern recognition and qualitative modeling.Comment: This is the extended version of our ICDM'05 paper (arXiv:cs/0702142
Attribute Value Reordering For Efficient Hybrid OLAP
The normalization of a data cube is the ordering of the attribute values. For
large multidimensional arrays where dense and sparse chunks are stored
differently, proper normalization can lead to improved storage efficiency. We
show that it is NP-hard to compute an optimal normalization even for 1x3
chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are
nearly statistically independent, we show that dimension-wise attribute
frequency sorting is an optimal normalization and takes time O(d n log(n)) for
data cubes of size n^d. When dimensions are not independent, we propose and
evaluate several heuristics. The hybrid OLAP (HOLAP) storage mechanism is
already 19%-30% more efficient than ROLAP, but normalization can improve it
further by 9%-13% for a total gain of 29%-44% over ROLAP
A Better Alternative to Piecewise Linear Time Series Segmentation
Time series are difficult to monitor, summarize and predict. Segmentation
organizes time series into few intervals having uniform characteristics
(flatness, linearity, modality, monotonicity and so on). For scalability, we
require fast linear time algorithms. The popular piecewise linear model can
determine where the data goes up or down and at what rate. Unfortunately, when
the data does not follow a linear model, the computation of the local slope
creates overfitting. We propose an adaptive time series model where the
polynomial degree of each interval vary (constant, linear and so on). Given a
number of regressors, the cost of each interval is its polynomial degree:
constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so
on. Our goal is to minimize the Euclidean (l_2) error for a given model
complexity. Experimentally, we investigate the model where intervals can be
either constant or linear. Over synthetic random walks, historical stock market
prices, and electrocardiograms, the adaptive model provides a more accurate
segmentation than the piecewise linear model without increasing the
cross-validation error or the running time, while providing a richer vocabulary
to applications. Implementation issues, such as numerical stability and
real-world performance, are discussed.Comment: to appear in SIAM Data Mining 200
Decoding billions of integers per second through vectorization
In many important applications -- such as search engines and relational
database systems -- data is stored in the form of arrays of integers. Encoding
and, most importantly, decoding of these arrays consumes considerable CPU time.
Therefore, substantial effort has been made to reduce costs associated with
compression and decompression. In particular, researchers have exploited the
superscalar nature of modern processors and SIMD instructions. Nevertheless, we
introduce a novel vectorized scheme called SIMD-BP128 that improves over
previously proposed vectorized approaches. It is nearly twice as fast as the
previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the
same time, SIMD-BP128 saves up to 2 bits per integer. For even better
compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has
a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while
being two times faster during decoding.Comment: For software, see https://github.com/lemire/FastPFor, For data, see
http://boytsov.info/datasets/clueweb09gap
Effect of disciplinary content of a simulation on open-ended problem-solving strategies
There is a long-standing debate about so-called generic vs. domain- or context-specific problem-solving skills and strategies. To investigate this issue, we developed a pair of structurally and functionally similar computer simulations and used them in an experimental study with undergraduate students from all fields of study. One, described to users as a teaching tool based on current entomology research, featured three species of ants whose tasks could change as a result of encounters with each other. The other, which we call a non-disciplinary simulation, featured three types of abstract moving shapes whose color could change after a collision. Users received no information about the latter, not even that it was a simulation. Both simulations, which students used in succession, allowed users to change the number and types of objects, and to move them freely on the screen. We told students that the problem was to describe and explain what occurred on the screen, and asked them to explain, while they were working with the simulation, what they observed, what they did and why. We conducted a first, quantitative analysis of various âsurfac
- âŠ