Search CORE

85 research outputs found

On the ACB compressor

Author: MASCI JONATAN
Publication venue: 'Pisa University Press'
Publication date: 03/06/2009
Field of study

Context-based compression methods are the most powerful approaches to squeeze arbitrary textual data. They offer a good predictive model for the subsequent data based on the already seen one, without assuming any probability distribution for the input source. In this thesis we analyze the adaptive ACB method (Buyanovsky, 94) which is mostly unexplored in the literature, although preliminary results showed compression ratios comparable (or even superior) to the best known data compression utilities. The novel feature of ACB consists of deploying both the previous context and the subsequent content to find a succinct encoding for the latter one. We perform a large set of experiments to study the experimental behavior of ACB and to compare it with known compressors, thus devising variations of the basic ACB-scheme that result promising for future developments

Electronic Thesis and Dissertation Archive - Università di Pisa

Lossless Differential Compression for Synchronizing Arbitrary Single-Dimensional Strings

Author: Karppanen Jari
Publication venue: Helsingin yliopisto
Publication date: 01/01/2012
Field of study

Differential compression allows expressing a modified document as differences relative to another version of the document. A compressed string requires space relative to amount of changes, irrespective of original document sizes. The purpose of this study was to answer what algorithms are suitable for universal lossless differential compression for synchronizing two arbitrary documents either locally or remotely. Two main problems in differential compression are finding the differences (differencing), and compactly communicating the differences (encoding). We discussed local differencing algorithms based on subsequence searching, hashtable lookups, suffix searching, and projection. We also discussed probabilistic remote algorithms based on both recursive comparison and characteristic polynomial interpolation of hashes computed from variable-length content-defined substrings. We described various heuristics for approximating optimal algorithms as arbitrary long strings and memory limitations force discarding information. Discussion also included compact delta encoding and in-place reconstruction. We presented results from empirical testing using discussed algorithms. The conclusions were that multiple algorithms need to be integrated into a hybrid implementation, which heuristically chooses algorithms based on evaluation of the input data. Algorithms based on hashtable lookups are faster on average and require less memory, but algorithms based on suffix searching find least differences. Interpolating characteristic polynomials was found to be too slow for general use. With remote hash comparison, content-defined chunks and recursive comparison can reduce protocol overhead. A differential compressor should be merged with a state-of-art non-differential compressor to enable more compact delta encoding. Input should be processed multiple times to allow constant a space bound without significant reduction in compression efficiency. Compression efficiently of current popular synchronizers could be improved, as our empiral testing showed that a non-differential compressor produced smaller files without having access to one of the two strings

Helsingin yliopiston digitaalinen arkisto

High-throughput DNA sequence data compression

Author: He Shan
Ji Zhen
Yang Xiao
Zhang Yongpeng
Zhu Zexuan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

University of Birmingham Research Portal

Algorithms and Data Structures for In-Memory Text Search Engines

Author: Transier Frederik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2010
Field of study

KITopen

Binary Differencing for Media Files

Author: Komsiyski V.
Publication venue: CWI
Publication date: 01/07/2013
Field of study

CWI's Institutional Repository

Fast Packet Processing on High Performance Architectures

Author: ANTICHI GIANNI
Publication venue: 'Pisa University Press'
Publication date: 05/05/2011
Field of study

The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

Electronic Thesis and Dissertation Archive - Università di Pisa

Front Matter - Soft Computing for Data Mining Applications

Author: Patnaik L.M.
Srinivasa K.G.
Venugopal K.R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic

ePrints@Bangalore University

Digital Image Access & Retrieval

Author: Heidorn P. Bryan
Sandore Beth
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository