18,082 research outputs found

    Near-optimal labeling schemes for nearest common ancestors

    Full text link
    We consider NCA labeling schemes: given a rooted tree TT, label the nodes of TT with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the label of their nearest common ancestor. For trees with nn nodes we present upper and lower bounds establishing that labels of size (2±ϵ)logn(2\pm \epsilon)\log n, ϵ<1\epsilon<1 are both sufficient and necessary. (All logarithms in this paper are in base 2.) Alstrup, Bille, and Rauhe (SIDMA'05) showed that ancestor and NCA labeling schemes have labels of size logn+Ω(loglogn)\log n +\Omega(\log \log n). Our lower bound increases this to logn+Ω(logn)\log n + \Omega(\log n) for NCA labeling schemes. Since Fraigniaud and Korman (STOC'10) established that labels in ancestor labeling schemes have size logn+Θ(loglogn)\log n +\Theta(\log \log n), our new lower bound separates ancestor and NCA labeling schemes. Our upper bound improves the 10logn10 \log n upper bound by Alstrup, Gavoille, Kaplan and Rauhe (TOCS'04), and our theoretical result even outperforms some recent experimental studies by Fischer (ESA'09) where variants of the same NCA labeling scheme are shown to all have labels of size approximately 8logn8 \log n

    Evaluation Measures for Hierarchical Classification: a unified view and novel approaches

    Full text link
    Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways. This paper studies the problem of evaluation in hierarchical classification by analyzing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behavior of existing approaches and how the proposed methods overcome most of these methods across a range of cases.Comment: Submitted to journa

    Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic

    Full text link
    Countless variants of the Lempel-Ziv compression are widely used in many real-life applications. This paper is concerned with a natural modification of the classical pattern matching problem inspired by the popularity of such compression methods: given an uncompressed pattern s[1..m] and a Lempel-Ziv representation of a string t[1..N], does s occur in t? Farach and Thorup gave a randomized O(nlog^2(N/n)+m) time solution for this problem, where n is the size of the compressed representation of t. We improve their result by developing a faster and fully deterministic O(nlog(N/n)+m) time algorithm with the same space complexity. Note that for highly compressible texts, log(N/n) might be of order n, so for such inputs the improvement is very significant. A (tiny) fragment of our method can be used to give an asymptotically optimal solution for the substring hashing problem considered by Farach and Muthukrishnan.Comment: submitte

    Simple and Efficient Fully-Functional Succinct Trees

    Full text link
    The fully-functional succinct tree representation of Navarro and Sadakane (ACM Transactions on Algorithms, 2014) supports a large number of operations in constant time using 2n+o(n)2n+o(n) bits. However, the full idea is hard to implement. Only a simplified version with O(logn)O(\log n) operation time has been implemented and shown to be practical and competitive. We describe a new variant of the original idea that is much simpler to implement and has worst-case time O(loglogn)O(\log\log n) for the operations. An implementation based on this version is experimentally shown to be superior to existing implementations

    A Minimal Periods Algorithm with Applications

    Full text link
    Kosaraju in ``Computation of squares in a string'' briefly described a linear-time algorithm for computing the minimal squares starting at each position in a word. Using the same construction of suffix trees, we generalize his result and describe in detail how to compute in O(k|w|)-time the minimal k-th power, with period of length larger than s, starting at each position in a word w for arbitrary exponent k2k\geq2 and integer s0s\geq0. We provide the complete proof of correctness of the algorithm, which is somehow not completely clear in Kosaraju's original paper. The algorithm can be used as a sub-routine to detect certain types of pseudo-patterns in words, which is our original intention to study the generalization.Comment: 14 page
    corecore