18,082 research outputs found
Near-optimal labeling schemes for nearest common ancestors
We consider NCA labeling schemes: given a rooted tree , label the nodes of
with binary strings such that, given the labels of any two nodes, one can
determine, by looking only at the labels, the label of their nearest common
ancestor.
For trees with nodes we present upper and lower bounds establishing that
labels of size , are both sufficient and
necessary. (All logarithms in this paper are in base 2.)
Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling
schemes have labels of size . Our lower bound
increases this to for NCA labeling schemes. Since
Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling
schemes have size , our new lower bound separates
ancestor and NCA labeling schemes. Our upper bound improves the
upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our
theoretical result even outperforms some recent experimental studies by Fischer
(ESA'09) where variants of the same NCA labeling scheme are shown to all have
labels of size approximately
Evaluation Measures for Hierarchical Classification: a unified view and novel approaches
Hierarchical classification addresses the problem of classifying items into a
hierarchy of classes. An important issue in hierarchical classification is the
evaluation of different classification algorithms, which is complicated by the
hierarchical relations among the classes. Several evaluation measures have been
proposed for hierarchical classification using the hierarchy in different ways.
This paper studies the problem of evaluation in hierarchical classification by
analyzing and abstracting the key components of the existing performance
measures. It also proposes two alternative generic views of hierarchical
evaluation and introduces two corresponding novel measures. The proposed
measures, along with the state-of-the art ones, are empirically tested on three
large datasets from the domain of text classification. The empirical results
illustrate the undesirable behavior of existing approaches and how the proposed
methods overcome most of these methods across a range of cases.Comment: Submitted to journa
Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic
Countless variants of the Lempel-Ziv compression are widely used in many
real-life applications. This paper is concerned with a natural modification of
the classical pattern matching problem inspired by the popularity of such
compression methods: given an uncompressed pattern s[1..m] and a Lempel-Ziv
representation of a string t[1..N], does s occur in t? Farach and Thorup gave a
randomized O(nlog^2(N/n)+m) time solution for this problem, where n is the size
of the compressed representation of t. We improve their result by developing a
faster and fully deterministic O(nlog(N/n)+m) time algorithm with the same
space complexity. Note that for highly compressible texts, log(N/n) might be of
order n, so for such inputs the improvement is very significant. A (tiny)
fragment of our method can be used to give an asymptotically optimal solution
for the substring hashing problem considered by Farach and Muthukrishnan.Comment: submitte
Simple and Efficient Fully-Functional Succinct Trees
The fully-functional succinct tree representation of Navarro and Sadakane
(ACM Transactions on Algorithms, 2014) supports a large number of operations in
constant time using bits. However, the full idea is hard to
implement. Only a simplified version with operation time has been
implemented and shown to be practical and competitive. We describe a new
variant of the original idea that is much simpler to implement and has
worst-case time for the operations. An implementation based on
this version is experimentally shown to be superior to existing
implementations
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
- …