5 research outputs found
Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes
The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46049-9_5[Abstract] For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of ÏÏ characters, we can store a nearly optimal alphabetic prefix-free code in o(Ï)o(Ï) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewordsâ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(ÏlogL+2Ï”L)O(ÏlogâĄL+2Ï”L) bits, where L is the maximum codeword length and ϔϔ is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of ââ bits in time O(â)O(â) using O(ÏlogL)O(ÏlogâĄL) bits of space.Ministerio de EconomĂa, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de EconomĂa, Industria y Competitividad; TIN2015-69951-RMinisterio de EconomĂa, Industria y Competitividad; ITC-20151305Ministerio de EconomĂa, Industria y Competitividad; ITC-20151247Xunta de Galicia; GRC2013/053Chile. NĂșcleo Milenio InformaciĂłn y CoordinaciĂłn en Redes; ICM/FIC.P10-024FCOST. IC1302Academy of Finland; 268324Academy of Finland; 25034
About BIRDS project (Bioinformatics and Information Retrieval Data Structures Analysis and Design)
BIRDS stands for "Bioinformatics and Information Retrieval Data Structures
analysis and design" and is a 4-year project (2016--2019) that has received
funding from the European Union's Horizon 2020 research and innovation
programme under the Marie Sklodowska-Curie grant agreement No 690941.
The overall goal of BIRDS is to establish a long term international network
involving leading researchers in the development of efficient data structures
in the fields of Bioinformatics and Information Retrieval, to strengthen the
partnership through the exchange of knowledge and expertise, and to develop
integrated approaches to improve current approaches in both fields. The
research will address challenges in storing, processing, indexing, searching
and navigating genome-scale data by designing new algorithms and data
structures for sequence analysis, networks representation or compressing and
indexing repetitive data.
BIRDS project is carried out by 7 research institutions from Australia
(University of Melbourne), Chile (University of Chile and University of
Concepci\'on), Finland (University of Helsinki), Japan (Kyushu University),
Portugal (Instituto de Engenharia de Sistemas e Computadores,
Investiga\c{c}\~ao e Desenvolvimento em Lisboa, INESC-ID), and Spain
(University of A Coru\~na), and a Spanish SME (Enxenio S.L.). It is coordinated
by the University of A Coru\~na (Spain).Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sklodowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941. CERI 201
Efficient and compact representations of some non-canonical prefix-free codes
For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to symbols. In this paper we first show how, given a probability distribution over an alphabet of Ï symbols, we can store an optimal alphabetic prefix-free code in O(ÏlgâĄL) bits such that we can encode and decode any codeword of length â in O(minâĄ(â,lgâĄL)) time, where L is the maximum codeword length. With O(2Ljavax.xml.bind.JAXBElement@792bb21e) further bits, for any constant Ï”>0, we can encode and decode O(lgâĄâ) time. We then show how to store a nearly optimal alphabetic prefix-free code in o(Ï) bits such that we can encode and decode in constant time. We also consider a kind of optimal prefix-free code introduced recently where the codewords' lengths are non-decreasing if arranged in lexicographic order of their reverses. We reduce their storage space to O(ÏlgâĄL) while maintaining encoding and decoding times in O(â). We also show how, with O(2Ï”L) further bits, we can encode and decode in constant time. All of our results hold in the word-RAM model