5 research outputs found

    Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46049-9_5[Abstract] For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of σσ characters, we can store a nearly optimal alphabetic prefix-free code in o(σ)o(σ) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(σlogL+2Ï”L)O(σlog⁥L+2Ï”L) bits, where L is the maximum codeword length and ϔϔ is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of ℓℓ bits in time O(ℓ)O(ℓ) using O(σlogL)O(σlog⁥L) bits of space.Ministerio de EconomĂ­a, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de EconomĂ­a, Industria y Competitividad; TIN2015-69951-RMinisterio de EconomĂ­a, Industria y Competitividad; ITC-20151305Ministerio de EconomĂ­a, Industria y Competitividad; ITC-20151247Xunta de Galicia; GRC2013/053Chile. NĂșcleo Milenio InformaciĂłn y CoordinaciĂłn en Redes; ICM/FIC.P10-024FCOST. IC1302Academy of Finland; 268324Academy of Finland; 25034

    About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)

    Full text link
    BIRDS stands for "Bioinformatics and Information Retrieval Data Structures analysis and design" and is a 4-year project (2016--2019) that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 690941. The overall goal of BIRDS is to establish a long term international network involving leading researchers in the development of efficient data structures in the fields of Bioinformatics and Information Retrieval, to strengthen the partnership through the exchange of knowledge and expertise, and to develop integrated approaches to improve current approaches in both fields. The research will address challenges in storing, processing, indexing, searching and navigating genome-scale data by designing new algorithms and data structures for sequence analysis, networks representation or compressing and indexing repetitive data. BIRDS project is carried out by 7 research institutions from Australia (University of Melbourne), Chile (University of Chile and University of Concepci\'on), Finland (University of Helsinki), Japan (Kyushu University), Portugal (Instituto de Engenharia de Sistemas e Computadores, Investiga\c{c}\~ao e Desenvolvimento em Lisboa, INESC-ID), and Spain (University of A Coru\~na), and a Spanish SME (Enxenio S.L.). It is coordinated by the University of A Coru\~na (Spain).Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. CERI 201

    Efficient and compact representations of some non-canonical prefix-free codes

    No full text
    For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to symbols. In this paper we first show how, given a probability distribution over an alphabet of σ symbols, we can store an optimal alphabetic prefix-free code in O(σlg⁥L) bits such that we can encode and decode any codeword of length ℓ in O(min⁥(ℓ,lg⁥L)) time, where L is the maximum codeword length. With O(2Ljavax.xml.bind.JAXBElement@792bb21e) further bits, for any constant Ï”>0, we can encode and decode O(lg⁡ℓ) time. We then show how to store a nearly optimal alphabetic prefix-free code in o(σ) bits such that we can encode and decode in constant time. We also consider a kind of optimal prefix-free code introduced recently where the codewords' lengths are non-decreasing if arranged in lexicographic order of their reverses. We reduce their storage space to O(σlg⁥L) while maintaining encoding and decoding times in O(ℓ). We also show how, with O(2Ï”L) further bits, we can encode and decode in constant time. All of our results hold in the word-RAM model
    corecore