16,702 research outputs found
Universal Compressed Text Indexing
The rise of repetitive datasets has lately generated a lot of interest in
compressed self-indexes based on dictionary compression, a rich and
heterogeneous family that exploits text repetitions in different ways. For each
such compression scheme, several different indexing solutions have been
proposed in the last two decades. To date, the fastest indexes for repetitive
texts are based on the run-length compressed Burrows-Wheeler transform and on
the Compact Directed Acyclic Word Graph. The most space-efficient indexes, on
the other hand, are based on the Lempel-Ziv parsing and on grammar compression.
Indexes for more universal schemes such as collage systems and macro schemes
have not yet been proposed. Very recently, Kempa and Prezza [STOC 2018] showed
that all dictionary compressors can be interpreted as approximation algorithms
for the smallest string attractor, that is, a set of text positions capturing
all distinct substrings. Starting from this observation, in this paper we
develop the first universal compressed self-index, that is, the first indexing
data structure based on string attractors, which can therefore be built on top
of any dictionary-compressed text representation. Let be the size of a
string attractor for a text of length . Our index takes
words of space and supports locating the
occurrences of any pattern of length in
time, for any constant . This is, in particular, the first index
for general macro schemes and collage systems. Our result shows that the
relation between indexing and compression is much deeper than what was
previously thought: the simple property standing at the core of all dictionary
compressors is sufficient to support fast indexed queries.Comment: Fixed with reviewer's comment
A Model-Based Approach for Compression of Fingerprint Images
We propose a new fingerprint image compression scheme based on the hybrid model of an image. Our scheme uses the essential steps of a typical automated fingerprint identification system (AFIS) such as enhancement, binarization and thinning to encode fingerprint images. The decoding process is based on reconstructing a hybrid surface by using the gray values on ridges and valleys. In this compression scheme, the ridge skeleton is coded efficiently by using differential chain codes. The valley skeleton is derived from the ridge skeleton and the gray values along the ridge and valley skeletons are encoded using the discrete cosine transform. The error between the original and the replica is also encoded to increase the quality. One advantage of our approach is that original features such as end points and bifurcation points can be extracted directly from compressed image even for a very high compression ratio. Another advantage is that the proposed scheme can be integrated to a typical AFIS easily. The algorithm has been applied to various fingerprint images, and high compression ratios like 63:1 have been obtained. A comparison to wavelet/scalar quantization (WSQ) has been also made
Panako: a scalable acoustic fingerprinting system handling time-scale and pitch modification
In this paper a scalable granular acoustic fingerprinting system robust against time and pitch scale modification is presented. The aim of acoustic fingerprinting is to identify identical, or recognize similar, audio fragments in a large set using condensed representations of audio signals, i.e. fingerprints. A robust fingerprinting system generates similar fingerprints for perceptually similar audio signals. The new system, presented here, handles a variety of distortions well. It is designed to be robust against pitch shifting, time stretching and tempo changes, while remaining scalable. After a query, the system returns the start time in the reference audio, and the amount of pitch shift and tempo change that has been applied. The design of the system that offers this unique combination of features is the main contribution of this research. The fingerprint itself consists of a combination of key points in a Constant-Q spectrogram. The system is evaluated on commodity hardware using a freely available reference database with fingerprints of over 30.000 songs. The results show that the system responds quickly and reliably on queries, while handling time and pitch scale modifications of up to ten percent
- …