10,823 research outputs found
Using Hashing to Solve the Dictionary Problem (In External Memory)
We consider the dictionary problem in external memory and improve the update
time of the well-known buffer tree by roughly a logarithmic factor. For any
\lambda >= max {lg lg n, log_{M/B} (n/B)}, we can support updates in time
O(\lambda / B) and queries in sublogarithmic time, O(log_\lambda n). We also
present a lower bound in the cell-probe model showing that our data structure
is optimal.
In the RAM, hash tables have been used to solve the dictionary problem faster
than binary search for more than half a century. By contrast, our data
structure is the first to beat the comparison barrier in external memory. Ours
is also the first data structure to depart convincingly from the indivisibility
paradigm
Search Efficient Binary Network Embedding
Traditional network embedding primarily focuses on learning a dense vector
representation for each node, which encodes network structure and/or node
content information, such that off-the-shelf machine learning algorithms can be
easily applied to the vector-format node representations for network analysis.
However, the learned dense vector representations are inefficient for
large-scale similarity search, which requires to find the nearest neighbor
measured by Euclidean distance in a continuous vector space. In this paper, we
propose a search efficient binary network embedding algorithm called BinaryNE
to learn a sparse binary code for each node, by simultaneously modeling node
context relations and node attribute relations through a three-layer neural
network. BinaryNE learns binary node representations efficiently through a
stochastic gradient descent based online learning algorithm. The learned binary
encoding not only reduces memory usage to represent each node, but also allows
fast bit-wise comparisons to support much quicker network node search compared
to Euclidean distance or other distance measures. Our experiments and
comparisons show that BinaryNE not only delivers more than 23 times faster
search speed, but also provides comparable or better search quality than
traditional continuous vector based network embedding methods
On the robustness of bucket brigade quantum RAM
We study the robustness of the bucket brigade quantum random access memory
model introduced by Giovannetti, Lloyd, and Maccone [Phys. Rev. Lett. 100,
160501 (2008)]. Due to a result of Regev and Schiff [ICALP '08 pp. 773], we
show that for a class of error models the error rate per gate in the bucket
brigade quantum memory has to be of order (where is the
size of the memory) whenever the memory is used as an oracle for the quantum
searching problem. We conjecture that this is the case for any realistic error
model that will be encountered in practice, and that for algorithms with
super-polynomially many oracle queries the error rate must be
super-polynomially small, which further motivates the need for quantum error
correction. By contrast, for algorithms such as matrix inversion [Phys. Rev.
Lett. 103, 150502 (2009)] or quantum machine learning [Phys. Rev. Lett. 113,
130503 (2014)] that only require a polynomial number of queries, the error rate
only needs to be polynomially small and quantum error correction may not be
required. We introduce a circuit model for the quantum bucket brigade
architecture and argue that quantum error correction for the circuit causes the
quantum bucket brigade architecture to lose its primary advantage of a small
number of "active" gates, since all components have to be actively error
corrected.Comment: Replaced with the published version. 13 pages, 9 figure
Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes
Bitmap indexes must be compressed to reduce input/output costs and minimize
CPU usage. To accelerate logical operations (AND, OR, XOR) over bitmaps, we use
techniques based on run-length encoding (RLE), such as Word-Aligned Hybrid
(WAH) compression. These techniques are sensitive to the order of the rows: a
simple lexicographical sort can divide the index size by 9 and make indexes
several times faster. We investigate reordering heuristics based on computed
attribute-value histograms. Simply permuting the columns of the table based on
these histograms can increase the sorting efficiency by 40%.Comment: To appear in proceedings of DOLAP 200
On Optimally Partitioning Variable-Byte Codes
The ubiquitous Variable-Byte encoding is one of the fastest compressed
representation for integer sequences. However, its compression ratio is usually
not competitive with other more sophisticated encoders, especially when the
integers to be compressed are small that is the typical case for inverted
indexes. This paper shows that the compression ratio of Variable-Byte can be
improved by 2x by adopting a partitioned representation of the inverted lists.
This makes Variable-Byte surprisingly competitive in space with the best
bit-aligned encoders, hence disproving the folklore belief that Variable-Byte
is space-inefficient for inverted index compression. Despite the significant
space savings, we show that our optimization almost comes for free, given that:
we introduce an optimal partitioning algorithm that does not affect indexing
time because of its linear-time complexity; we show that the query processing
speed of Variable-Byte is preserved, with an extensive experimental analysis
and comparison with several other state-of-the-art encoders.Comment: Published in IEEE Transactions on Knowledge and Data Engineering
(TKDE), 15 April 201
- …