Search CORE

11,344 research outputs found

New Algorithms and Lower Bounds for Sequential-Access Data Compression

Author: Gagie Travis
Publication venue
Publication date: 01/01/2009
Field of study

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

arXiv.org e-Print Archive

Publications at Bielefeld University

Record-Linkage from a Technical Point of View

Author: Rainer Schnell
Publication venue
Publication date
Field of study

TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

Research Papers in Economics

Finding approximate palindromes in strings

Author: Alexandre H.L. Porto
Apostolico
Baeza-Yates
Bondy
Breslauer
Galil
Gusfield
Jurka
Knuth
Landau
Landau
Levenstein
Manacher
Myers
Sankoff
Stephen
Ukkonen
Ukkonen
Valmir C. Barbosa
Wu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2002
Field of study

We introduce a novel definition of approximate palindromes in strings, and provide an algorithm to find all maximal approximate palindromes in a string with up to

k

errors. Our definition is based on the usual edit operations of approximate pattern matching, and the algorithm we give, for a string of size

n

on a fixed alphabet, runs in

O(k^2 n)

time. We also discuss two implementation-related improvements to the algorithm, and demonstrate their efficacy in practice by means of both experiments and an average-case analysis

arXiv.org e-Print Archive

CiteSeerX

Crossref

Graph isomorphism and genotypical houses

Author: Dalton Ruth
Kirsan Ciler
Publication venue
Publication date: 01/06/2005
Field of study

This paper will introduce a new method, known as small graph matching, anddemonstrate how it may be used to determine the genotype signature of a sample ofbuildings. First, the origins of the method and its relationship to other ?similarity? testingtechniques will be discussed. Then the range of possible actions and transformations willbe established through the creation of a set of rules. Next, in order to fully explain thismethod, a technique of normalizing the similarity measure is presented in order to permitthe comparison of graphs of differing magnitude. The last stage of this method ispresented, this being the comparison of all possible graph-pairs within a given sampleand the mean-distance calculated for all individual graphs. This results in theidentification of a genotype signature. Finally, this paper presents an empiricalapplication of this method and shows how effective it is, not only for the identification ofa building genotype, but also for assessing the homogeneity of a sample or sub-samples

Northumbria University Research Portal

UCL Discovery

Lancaster E-Prints

A Survey of Software-based String Matching Algorithms for Forensic Analysis

Author: Liao Yi-Ching
Publication venue: Scholarly Commons
Publication date: 19/05/2015
Field of study

Employing a fast string matching algorithm is essential for minimizing the overhead of extracting structured files from a raw disk image. In this paper, we summarize the concept, implementation, and main features of ten software-based string matching algorithms, and evaluate their applicability for forensic analysis. We provide comparisons between the selected software-based string matching algorithms from the perspective of forensic analysis by conducting their performance evaluation for file carving. According to the experimental results, the Shift-Or algorithm (R. Baeza-Yates & Gonnet, 1992) and the Karp-Rabin algorithm (Karp & Rabin, 1987) have the minimized search time for identifying the locations of specified headers and footers in the target disk. Keywords: string matching algorithm, forensic analysis, file carving, Scalpel, data recover

Embry-Riddle Aeronautical University