Search CORE

461 research outputs found

Image Characterization and Classification by Physical Complexity

Author: Delahaye Jean-Paul
Gaucherel Cedric
Zenil Hector
Publication venue
Publication date: 03/07/2011
Field of study

We present a method for estimating the complexity of an image based on Bennett's concept of logical depth. Bennett identified logical depth as the appropriate measure of organized complexity, and hence as being better suited to the evaluation of the complexity of objects in the physical world. Its use results in a different, and in some sense a finer characterization than is obtained through the application of the concept of Kolmogorov complexity alone. We use this measure to classify images by their information content. The method provides a means for classifying and evaluating the complexity of objects by way of their visual representations. To the authors' knowledge, the method and application inspired by the concept of logical depth presented herein are being proposed and implemented for the first time.Comment: 30 pages, 21 figure

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Oxford University Research Archive

HAL-CIRAD

Hal-Diderot

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Author: Gutarra Eduardo
Kaser Owen
Lemire Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2012
Field of study

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

arXiv.org e-Print Archive

R-libre

Crossref

Efficiently Extracting Randomness from Imperfect Stochastic Processes

Author: Bruck Jehoshua
Zhou Hongchao
Publication venue
Publication date: 04/09/2012
Field of study

We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are substantially smaller than Shannon's entropy limit. Our main contribution is the design of extractors that have a variable input-length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.Comment: 2 columns, 16 page

arXiv.org e-Print Archive

Caltech Authors

Computational Intelligence and Complexity Measures for Chaotic Information Processing

Author: Arasteh Davoud
Publication venue: ScholarWorks@UNO
Publication date: 16/05/2008
Field of study

This dissertation investigates the application of computational intelligence methods in the analysis of nonlinear chaotic systems in the framework of many known and newly designed complex systems. Parallel comparisons are made between these methods. This provides insight into the difficult challenges facing nonlinear systems characterization and aids in developing a generalized algorithm in computing algorithmic complexity measures, Lyapunov exponents, information dimension and topological entropy. These metrics are implemented to characterize the dynamic patterns of discrete and continuous systems. These metrics make it possible to distinguish order from disorder in these systems. Steps required for computing Lyapunov exponents with a reorthonormalization method and a group theory approach are formalized. Procedures for implementing computational algorithms are designed and numerical results for each system are presented. The advance-time sampling technique is designed to overcome the scarcity of phase space samples and the buffer overflow problem in algorithmic complexity measure estimation in slow dynamics feedback-controlled systems. It is proved analytically and tested numerically that for a quasiperiodic system like a Fibonacci map, complexity grows logarithmically with the evolutionary length of the data block. It is concluded that a normalized algorithmic complexity measure can be used as a system classifier. This quantity turns out to be one for random sequences and a non-zero value less than one for chaotic sequences. For periodic and quasi-periodic responses, as data strings grow their normalized complexity approaches zero, while a faster deceasing rate is observed for periodic responses. Algorithmic complexity analysis is performed on a class of certain rate convolutional encoders. The degree of diffusion in random-like patterns is measured. Simulation evidence indicates that algorithmic complexity associated with a particular class of 1/n-rate code increases with the increase of the encoder constraint length. This occurs in parallel with the increase of error correcting capacity of the decoder. Comparing groups of rate-1/n convolutional encoders, it is observed that as the encoder rate decreases from 1/2 to 1/7, the encoded data sequence manifests smaller algorithmic complexity with a larger free distance value

University of New Orleans

Lempel Ziv Welch data compression using associative processing as an enabling technology for real time application

Author: Narang Manish
Publication venue: RIT Scholar Works
Publication date: 01/03/1998
Field of study

Data compression is a term that refers to the reduction of data representation requirements either in storage and/or in transmission. A commonly used algorithm for compression is the Lempel-Ziv-Welch (LZW) method proposed by Terry A. Welch[l]. LZW is an adaptive, dictionary based, lossless algorithm. This provides for a general compression mechanism that is applicable to a broad range of inputs. Furthermore, the lossless nature of LZW implies that it is a reversible process which results in the original file/message being fully recoverable from compression. A variant of this algorithm is currently the foundation of the UNIX compress program. Additionally, LZW is one of the compression schemes defined in the TIFF standard[12], as well as in the CCITT V.42bis standard. One of the challenges in designing an efficient compression mechanism, such as LZW, which can be used in real time applications, is the speed of the search into the data dictionary. In this paper an Associative Processing implementation of the LZW algorithm is presented. This approach provides an efficient solution to this requirement. Additionally, it is shown that Associative Processing (ASP) allows for rapid and elegant development of the LZW algorithm that will generally outperform standard approaches in complexity, readability, and performance

RIT Scholar Works

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Huffman-based Code Compression Techniques for Embedded Systems

Author: Bonny Talal
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

KITopen

Compressing DNA sequence databases with coil

Author: Hendy Michael D.
White W. Timothy J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2008
Field of study

Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

Massey Research Online

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Hierarchical Relative Lempel-Ziv Compression

Author: Bille Philip
Puglisi Simon J.
Tarnow Simon R.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 21st International Symposium on Experimental Algorithms (SEA 2023)
Publication date: 01/01/2023
Field of study

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string S is compressed relative to a second string R (called the reference) by parsing S into a sequence of substrings that occur in R. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such datasets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we propose a new compression scheme hierarchical relative Lempel-Ziv (HRLZ) which form a rooted tree (or hierarchy) on the strings and then compress each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome datasets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality-sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto