Search CORE

160 research outputs found

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Recommended from our members

Streaming Algorithms Via Reductions

Author: Crouch Michael S
Publication venue: ScholarWorks@UMass Amherst
Publication date: 12/11/2014
Field of study

In the streaming algorithms model of computation we must process data in order and without enough memory to remember the entire input. We study reductions between problems in the streaming model with an eye to using reductions as an algorithm design technique. Our contributions include: * Linear Transformation reductions, which compose with existing linear sketch techniques. We use these for small-space algorithms for numeric measurements of distance-from-periodicity, finding the period of a numeric stream, and detecting cyclic shifts. * The first streaming graph algorithms in the sliding window\u27 model, where we must consider only the most recent L elements for some fixed threshold L. We develop basic algorithms for connectivity and unweighted maximum matching, then develop a variety of other algorithms via reductions to these problems. * A new reduction from maximum weighted matching to maximum unweighted matching. This reduction immediately yields improved approximation guarantees for maximum weighted matching in the semistreaming, sliding window, and MapReduce models, and extends to the more general problem of finding maximum independent sets in p-systems. * Algorithms in a stream-of-samples model which exhibit clear sample vs. space tradeoffs. These algorithms are also inspired by examining reductions. We provide algorithms for calculating F_k frequency moments and graph connectivity

ScholarWorks@UMass Amherst

Estimating the weight of metric minimum spanning trees in sublinear time

Author: Artur Czumaj
Christian Sohler
Czumaj A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 05/02/2008
Field of study

In this paper we present a sublinear-time

(1+\varepsilon)

-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an

n

-point metric space. The running time of the algorithm is

\widetilde{\mathcal{O}}(n/\varepsilon^{\mathcal{O}(1)})

. Since the full description of an

n

-point metric space is of size

\Theta(n^2)

, the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in

o(n)

time the weight of the minimum spanning tree to within any factor. We also show that no deterministic algorithm can achieve a

B

-approximation in

o(n^2/B^3)

time. Furthermore, it has been previously shown that no

o(n^2)

algorithm exists that returns a spanning tree whose weight is within a constant times the optimum

CiteSeerX

Crossref

Warwick Research Archives Portal Repository

Learning structure and schemas from heterogeneous domains in networked systems: a survey

Author: Biba Marenglen
Xhafa Xhafa Fatos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible to manually organize such documents. Additionally, most of the document sexist in an unstructured form and do not follow any schemas. Therefore, research efforts in this direction are being dedicated to automatically infer structure and schemas. This is essential in order to better organize huge collections as well as to effectively and efficiently retrieve documents in heterogeneous domains in networked system. This paper presents a survey of the state-of-the-art methods for inferring structure from documents and schemas in networked environments. The survey is organized around the most important application domains, namely, bio-informatics, sensor networks, social networks, P2Psystems, automation and control, transportation and privacy preserving for which we analyze the recent developments on dealing with unstructured data in such domains.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

State-of-the-art in data stream mining

Author: Gaber M.
Gama J.
Publication venue
Publication date: 17/09/2007
Field of study

Portsmouth University Research Portal (Pure)