4 research outputs found

    Distributed search based on self-indexed compressed text

    Get PDF
    Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e., an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.Fil: Arroyuelo, Diego. No especifĂ­ca;Fil: Gil Costa, Graciela VerĂłnica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - San Luis; ArgentinaFil: GonzĂĄlez, SenĂ©n. No especifĂ­ca;Fil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: OyarzĂșn, Mauricio. Universidad de Santiago de Chile; Chil

    Parallel Wavelet Tree Construction

    Full text link
    We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core machine with two-way hyper-threading that we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings of the IEEE Data Compression Conference, 201

    Parallel Construction of Wavelet Trees on Multicore Architectures

    Get PDF
    The wavelet tree has become a very useful data structure to efficiently represent and query large volumes of data in many different domains, from bioinformatics to geographic information systems. One problem with wavelet trees is their construction time. In this paper, we introduce two algorithms that reduce the time complexity of a wavelet tree's construction by taking advantage of nowadays ubiquitous multicore machines. Our first algorithm constructs all the levels of the wavelet in parallel in O(n)O(n) time and O(nlgâĄÏƒ+σlg⁥n)O(n\lg\sigma + \sigma\lg n) bits of working space, where nn is the size of the input sequence and σ\sigma is the size of the alphabet. Our second algorithm constructs the wavelet tree in a domain-decomposition fashion, using our first algorithm in each segment, reaching O(lg⁥n)O(\lg n) time and O(nlgâĄÏƒ+pσlg⁥n/lgâĄÏƒ)O(n\lg\sigma + p\sigma\lg n/\lg\sigma) bits of extra space, where pp is the number of available cores. Both algorithms are practical and report good speedup for large real datasets.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    Grammar compressed sequences with rank/select support

    Get PDF
    An early partial version of this paper appeared in Proc. SPIRE 2014: G. Navarro, A. Ordóñez Grammar compressed sequences with rank/select support, Proc. 21st International Symposium on String Processing and Information Retrieval, LNCS, SPIRE, vol. 8799 (2014), pp. 31–44The final publication is available at Springer via http://dx.doi.org/10.1016/j.jda.2016.10.001[Abstract] Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly repetitive sequences, and classical statistical compression proves ineffective. We introduce, instead, grammar-based representations for repetitive sequences, which use up to 6% of the space needed by statistically compressed representations, and support direct access and rank/select operations within tens of microseconds. We demonstrate the impact of our structures in text indexing applications.Chile. Fondo Nacional de Desarrollo CientĂ­fico y TecnolĂłgico; 140796Ministerio de EconomĂ­a, Industria y Competitividad; 00645663/ITC-20133062Ministerio de EconomĂ­a, Industria y Competitividad; TIN2009-14560-C03-02Ministerio de EconomĂ­a, Industria y Competitividad; TIN2010-21246-C02-01Ministerio de EconomĂ­a, Industria y Competitividad; TIN2013-46238-C4-3-RMinisterio de EconomĂ­a, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de EconomĂ­a, Industria y Competitividad; AP2010-6038Xunta de Galicia; GRC2013/05
    corecore