2,956 research outputs found

    Rank, select and access in grammar-compressed strings

    Full text link
    Given a string SS of length NN on a fixed alphabet of σ\sigma symbols, a grammar compressor produces a context-free grammar GG of size nn that generates SS and only SS. In this paper we describe data structures to support the following operations on a grammar-compressed string: \mbox{rank}_c(S,i) (return the number of occurrences of symbol cc before position ii in SS); \mbox{select}_c(S,i) (return the position of the iith occurrence of cc in SS); and \mbox{access}(S,i,j) (return substring S[i,j]S[i,j]). For rank and select we describe data structures of size O(nσlogN)O(n\sigma\log N) bits that support the two operations in O(logN)O(\log N) time. We propose another structure that uses O(nσlog(N/n)(logN)1+ϵ)O(n\sigma\log (N/n)(\log N)^{1+\epsilon}) bits and that supports the two queries in O(logN/loglogN)O(\log N/\log\log N), where ϵ>0\epsilon>0 is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires O(nlogN)O(n\log N) bits of space and O(logN+m/logσN)O(\log N+m/\log_\sigma N) time to extract m=ji+1m=j-i+1 consecutive symbols from SS. Alternatively, we can achieve O(logN/loglogN+m/logσN)O(\log N/\log\log N+m/\log_\sigma N) query time using O(nlog(N/n)(logN)1+ϵ)O(n\log (N/n)(\log N)^{1+\epsilon}) bits of space. This matches a lower bound stated by Verbin and Yu for strings where NN is polynomially related to nn.Comment: 16 page

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    Fully dynamic data structure for LCE queries in compressed space

    Get PDF
    A Longest Common Extension (LCE) query on a text TT of length NN asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding G\mathcal{G} of size w=O(min(zlogNlogM,N))w = O(\min(z \log N \log^* M, N)) [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of TT, which can be seen as a compressed representation of TT, has a capability to support LCE queries in O(logN+loglogM)O(\log N + \log \ell \log^* M) time, where \ell is the answer to the query, zz is the size of the Lempel-Ziv77 (LZ77) factorization of TT, and M4NM \geq 4N is an integer that can be handled in constant time under word RAM model. In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, G\mathcal{G} can be enhanced to support efficient update operations: After processing G\mathcal{G} in O(wfA)O(w f_{\mathcal{A}}) time, we can insert/delete any (sub)string of length yy into/from an arbitrary position of TT in O((y+logNlogM)fA)O((y+ \log N\log^* M) f_{\mathcal{A}}) time, where fA=O(min{loglogMloglogwlogloglogM,logwloglogw})f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \}). This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct G\mathcal{G} in O(NfA)O(N f_{\mathcal{A}}) time from uncompressed string TT; in O(nloglognlogNlogM)O(n \log\log n \log N \log^* M) time from grammar-compressed string TT represented by a straight-line program of size nn; and in O(zfAlogNlogM)O(z f_{\mathcal{A}} \log N \log^* M) time from LZ77-compressed string TT with zz factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing.Comment: arXiv admin note: text overlap with arXiv:1504.0695
    corecore