737 research outputs found

    Optimal Prefix Codes with Fewer Distinct Codeword Lengths are Faster to Construct

    Full text link
    A new method for constructing minimum-redundancy binary prefix codes is described. Our method does not explicitly build a Huffman tree; instead it uses a property of optimal prefix codes to compute the codeword lengths corresponding to the input weights. Let nn be the number of weights and kk be the number of distinct codeword lengths as produced by the algorithm for the optimum codes. The running time of our algorithm is O(k⋅n)O(k \cdot n). Following our previous work in \cite{be}, no algorithm can possibly construct optimal prefix codes in o(k⋅n)o(k \cdot n) time. When the given weights are presorted our algorithm performs O(9k⋅log⁡2kn)O(9^k \cdot \log^{2k}{n}) comparisons.Comment: 23 pages, a preliminary version appeared in STACS 200

    Optimal Prefix Free Codes with Partial Sorting

    Get PDF
    We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen\u27s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988)

    Critical Data Compression

    Full text link
    A new approach to data compression is developed and applied to multimedia content. This method separates messages into components suitable for both lossless coding and 'lossy' or statistical coding techniques, compressing complex objects by separately encoding signals and noise. This is demonstrated by compressing the most significant bits of data exactly, since they are typically redundant and compressible, and either fitting a maximally likely noise function to the residual bits or compressing them using lossy methods. Upon decompression, the significant bits are decoded and added to a noise function, whether sampled from a noise model or decompressed from a lossy code. This results in compressed data similar to the original. For many test images, a two-part image code using JPEG2000 for lossy coding and PAQ8l for lossless coding produces less mean-squared error than an equal length of JPEG2000. Computer-generated images typically compress better using this method than through direct lossy coding, as do many black and white photographs and most color photographs at sufficiently high quality levels. Examples applying the method to audio and video coding are also demonstrated. Since two-part codes are efficient for both periodic and chaotic data, concatenations of roughly similar objects may be encoded efficiently, which leads to improved inference. Applications to artificial intelligence are demonstrated, showing that signals using an economical lossless code have a critical level of redundancy which leads to better description-based inference than signals which encode either insufficient data or too much detail.Comment: 99 pages, 31 figure

    PRACTICAL APPLICATION OF IRREDUCIBLE CODINGS: VIRTUAL EXTENSION OF STORAGE CAPACITY

    Get PDF

    A Comparison of Compressed and Uncompressed Transmission Modes

    Get PDF
    In this paper we address the problem of host to host communication. In particular, we discuss the issue of efficient and adaptive transmission mechanisms over possible physical links. We develop a tool for making decisions regarding the flow of control sequences and data from and to a host. The issue of compression is discussed in details, a decision box and an optimizing tool for finding the appropriate thresholds for a decision are developed. Physical parameters like the data rate, bandwidth of the communication medium, distance between the hosts, baud rate, levels of discretization, signal to noise ratio and propagation speed of the signal are taken into consideration while developing our decision system. Theoretical analysis is performed to develop mathematical models for the optimization algorithm. Simulation models are also developed for testing both the optimization and the decision tool box

    Operator/System Communication : An Optimizing Decision Tool

    Get PDF
    Copyright 1990 Society of Photo-Optical Instrumentation Engineers. One print or electronic copy may be made for personal use only. Systematic reproduction and distribution, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.In this paper we address the problem of operator/system communication. In particular, we discuss the issue of efficient and adaptive transmission mechanisms over possible physical links. We develop a tool for making decisions regarding the flow of control sequences and data from and to the operator. The issue of compression is discussed in details, a decision box and an optimizing tool for finding the appropriate thresholds for a decision are developed. Physical parameters like the data rate, bandwidth of the communication medium, distance between the operator and the system, baud rate, levels of discretization, signal to noise ratio and propagation speed of the signal are taken into consideration while developing our decision system. Theoretical analysis is performed to develop mathematical models for the optimization algorithm. Simulation models are also developed for testing both the optimization and the decision tool box.http://dx.doi.org/10.1117/12.2549

    Performance of different strategies for error detection and correction in arithmetic encoding

    Get PDF
    Lossless source encoding is occasionally used in some data compression applications. One of these encoding schemes is the arithmetic encoding. When data is to be transmitted via communication channel, noise and impurities imposed by the channel cause errors. To reduce the effect of errors, channel encoder is added prior to transmission through the channel. Channel encoder inserts some bits that help channel decoder at the receiver end to detect and correct errors. These added error detection and correction bits are redundancy that causes reduction in the compression ratio and hence an increase in data rate through the channel. The higher the detection and correction capability, the larger the added redundancy needed. Different approach for error detection and correction is used in this work. It is suitable for lossless data compression wherein errors are assumed to occur with low rate but causes very high propagation. That is, an error in one data symbol causes all the following symbols to be in error with high probability. This was shown to be the case in arithmetic encoding and Lemple-Ziv algorithms for data compression. With this approach, redundancy in a form of a marker, is added to the data before it is compressed by the source encoder. The decoder examine the data for existence of errors and correct them. Different approaches for redundancy marker is examined and compared. As a measure for comparison, we used misdetection by testing one or more marker location, as well as miscortrection. These performance measures are calculated analytically and by computer simulation. The results are also compared to those obtained with channel encoding such as Hamming codes. We found that our approach performs as well as channel encoder. However, while Hamming codes results in an erroneous data when more than one error occurs, this approach gives a clear indication for this situation

    A tutorial on the characterisation and modelling of low layer functional splits for flexible radio access networks in 5G and beyond

    Get PDF
    The centralization of baseband (BB) functions in a radio access network (RAN) towards data processing centres is receiving increasing interest as it enables the exploitation of resource pooling and statistical multiplexing gains among multiple cells, facilitates the introduction of collaborative techniques for different functions (e.g., interference coordination), and more efficiently handles the complex requirements of advanced features of the fifth generation (5G) new radio (NR) physical layer, such as the use of massive multiple input multiple output (MIMO). However, deciding the functional split (i.e., which BB functions are kept close to the radio units and which BB functions are centralized) embraces a trade-off between the centralization benefits and the fronthaul costs for carrying data between distributed antennas and data processing centres. Substantial research efforts have been made in standardization fora, research projects and studies to resolve this trade-off, which becomes more complicated when the choice of functional splits is dynamically achieved depending on the current conditions in the RAN. This paper presents a comprehensive tutorial on the characterisation, modelling and assessment of functional splits in a flexible RAN to establish a solid basis for the future development of algorithmic solutions of dynamic functional split optimisation in 5G and beyond systems. First, the paper explores the functional split approaches considered by different industrial fora, analysing their equivalences and differences in terminology. Second, the paper presents a harmonized analysis of the different BB functions at the physical layer and associated algorithmic solutions presented in the literature, assessing both the computational complexity and the associated performance. Based on this analysis, the paper presents a model for assessing the computational requirements and fronthaul bandwidth requirements of different functional splits. Last, the model is used to derive illustrative results that identify the major trade-offs that arise when selecting a functional split and the key elements that impact the requirements.This work has been partially funded by Huawei Technologies. Work by X. Gelabert and B. Klaiqi is partially funded by the European Union's Horizon Europe research and innovation programme (HORIZON-MSCA-2021-DN-0) under the Marie SkƂodowska-Curie grant agreement No 101073265. Work by J. Perez-Romero and O. Sallent is also partially funded by the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union’s Horizon Europe research and innovation programme under Grant Agreements No. 101096034 (VERGE project) and No. 101097083 (BeGREEN project) and by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 under ARTIST project (ref. PID2020-115104RB-I00). This last project has also funded the work by D. Campoy.Peer ReviewedPostprint (author's final draft

    Transform Based And Search Aware Text Compression Schemes And Compressed Domain Text Retrieval

    Get PDF
    In recent times, we have witnessed an unprecedented growth of textual information via the Internet, digital libraries and archival text in many applications. While a good fraction of this information is of transient interest, useful information of archival value will continue to accumulate. We need ways to manage, organize and transport this data from one point to the other on data communications links with limited bandwidth. We must also have means to speedily find the information we need from this huge mass of data. Sometimes, a single site may also contain large collections of data such as a library database, thereby requiring an efficient search mechanism even to search within the local data. To facilitate the information retrieval, an emerging ad hoc standard for uncompressed text is XML which preprocesses the text by putting additional user defined metadata such as DTD or hyperlinks to enable searching with better efficiency and effectiveness. This increases the file size considerably, underscoring the importance of applying text compression. On account of efficiency (in terms of both space and time), there is a need to keep the data in compressed form for as much as possible. Text compression is concerned with techniques for representing the digital text data in alternate representations that takes less space. Not only does it help conserve the storage space for archival and online data, it also helps system performance by requiring less number of secondary storage (disk or CD Rom) accesses and improves the network transmission bandwidth utilization by reducing the transmission time. Unlike static images or video, there is no international standard for text compression, although compressed formats like .zip, .gz, .Z files are increasingly being used. In general, data compression methods are classified as lossless or lossy. Lossless compression allows the original data to be recovered exactly. Although used primarily for text data, lossless compression algorithms are useful in special classes of images such as medical imaging, finger print data, astronomical images and data bases containing mostly vital numerical data, tables and text information. Many lossy algorithms use lossless methods at the final stage of the encoding stage underscoring the importance of lossless methods for both lossy and lossless compression applications. In order to be able to effectively utilize the full potential of compression techniques for the future retrieval systems, we need efficient information retrieval in the compressed domain. This means that techniques must be developed to search the compressed text without decompression or only with partial decompression independent of whether the search is done on the text or on some inversion table corresponding to a set of key words for the text. In this dissertation, we make the following contributions: (1) Star family compression algorithms: We have proposed an approach to develop a reversible transformation that can be applied to a source text that improves existing algorithm\u27s ability to compress. We use a static dictionary to convert the English words into predefined symbol sequences. These transformed sequences create additional context information that is superior to the original text. Thus we achieve some compression at the preprocessing stage. We have a series of transforms which improve the performance. Star transform requires a static dictionary for a certain size. To avoid the considerable complexity of conversion, we employ the ternary tree data structure that efficiently converts the words in the text to the words in the star dictionary in linear time. (2) Exact and approximate pattern matching in Burrows-Wheeler transformed (BWT) files: We proposed a method to extract the useful context information in linear time from the BWT transformed text. The auxiliary arrays obtained from BWT inverse transform brings logarithm search time. Meanwhile, approximate pattern matching can be performed based on the results of exact pattern matching to extract the possible candidate for the approximate pattern matching. Then fast verifying algorithm can be applied to those candidates which could be just small parts of the original text. We present algorithms for both k-mismatch and k-approximate pattern matching in BWT compressed text. A typical compression system based on BWT has Move-to-Front and Huffman coding stages after the transformation. We propose a novel approach to replace the Move-to-Front stage in order to extend compressed domain search capability all the way to the entropy coding stage. A modification to the Move-to-Front makes it possible to randomly access any part of the compressed text without referring to the part before the access point. (3) Modified LZW algorithm that allows random access and partial decoding for the compressed text retrieval: Although many compression algorithms provide good compression ratio and/or time complexity, LZW is the first one studied for the compressed pattern matching because of its simplicity and efficiency. Modifications on LZW algorithm provide the extra advantage for fast random access and partial decoding ability that is especially useful for text retrieval systems. Based on this algorithm, we can provide a dynamic hierarchical semantic structure for the text, so that the text search can be performed on the expected level of granularity. For example, user can choose to retrieve a single line, a paragraph, or a file, etc. that contains the keywords. More importantly, we will show that parallel encoding and decoding algorithm is trivial with the modified LZW. Both encoding and decoding can be performed with multiple processors easily and encoding and decoding process are independent with respect to the number of processors
    • 

    corecore