24 research outputs found

    File compression using probabilistic grammars and LR parsing

    Get PDF
    Data compression, the reduction in size of the physical representation of data being stored or transmitted, has long been of interest both as a research topic and as a practical technique. Different methods are used for encoding different classes of data files. The purpose of this research is to compress a class of highly redundant data files whose contents are partially described by a context-free grammar (i.e. text files containing computer programs). An encoding technique is developed for the removal of structural dependancy due to the context-free structure of such files. The technique depends on a type of LR parsing method called LALR(K) (Lookahead LRM). The encoder also pays particular attention to the encoding of editing characters, comments, names and constants. The encoded data maintains the exact information content of the original data. Hence, a decoding technique (depending on the same parsing method) is developed to recover the original information from its compressed representation. The technique is demonstrated by compressing Pascal programs. An optimal coding scheme (based on Huffman codes) is used to encode the parsing alternatives in each parsing state. The decoder uses these codes during the decoding phase. Also Huffman codes, based on the probability of the symbols c oncerned, are used when coding editing characterst comments, names and constants. The sizes of the parsing tables (and subsequently the encoding tables) were considerably reduced by splitting them into a number of sub-tables. The minimum and the average code length of the average program are derived from two different matrices. These matrices are constructed from a probabilistic grammar, and the language generated by this grammar. Finally, various comparisons are made with a related encoding method by using a simple context-free language

    Error detection and correction in compressed data

    Get PDF
    Encoded data is very sensitive to the channel errors. Especially if the data is compressed by Arithmetic Encoding procedure, then the error propagation is very high. The error propagation in Arithmetic Coding is studied. Exploiting the high error propagation property when compressing data by Arithmetic encoding procedure, two different algorithms have been proposed for error detection and correction. Under certain conditions these algorithms detect and with a very high probability correct the errors introduced to the compressed data

    Comparison of Methods for Text Compression

    Get PDF
    One of the purposes of this research was to introduce several well-known text compression methods and to test them in order to compare their performances, not only with each other, but also with results from various previous researches. Welch's implementation of the Ziv-Lempel method was found to outperform any other single method introduced in this thesis, or any combination of methods. One other purpose of this research was to calculate the average distance from any one bit to the next synchronization point in static Huffman decoding, following a decoding error. The average distance in words decoded was predicted to be the average length of a codeword in bits, and tests on resynchronization showed that this was a good prediction.Computing and Information Scienc

    Performance of different strategies for error detection and correction in arithmetic encoding

    Get PDF
    Lossless source encoding is occasionally used in some data compression applications. One of these encoding schemes is the arithmetic encoding. When data is to be transmitted via communication channel, noise and impurities imposed by the channel cause errors. To reduce the effect of errors, channel encoder is added prior to transmission through the channel. Channel encoder inserts some bits that help channel decoder at the receiver end to detect and correct errors. These added error detection and correction bits are redundancy that causes reduction in the compression ratio and hence an increase in data rate through the channel. The higher the detection and correction capability, the larger the added redundancy needed. Different approach for error detection and correction is used in this work. It is suitable for lossless data compression wherein errors are assumed to occur with low rate but causes very high propagation. That is, an error in one data symbol causes all the following symbols to be in error with high probability. This was shown to be the case in arithmetic encoding and Lemple-Ziv algorithms for data compression. With this approach, redundancy in a form of a marker, is added to the data before it is compressed by the source encoder. The decoder examine the data for existence of errors and correct them. Different approaches for redundancy marker is examined and compared. As a measure for comparison, we used misdetection by testing one or more marker location, as well as miscortrection. These performance measures are calculated analytically and by computer simulation. The results are also compared to those obtained with channel encoding such as Hamming codes. We found that our approach performs as well as channel encoder. However, while Hamming codes results in an erroneous data when more than one error occurs, this approach gives a clear indication for this situation
    corecore