18,100 research outputs found
Address-Event Variable-Length Compression for Time-Encoded Data
Time-encoded signals, such as social network update logs and spiking traces
in neuromorphic processors, are defined by multiple traces carrying information
in the timing of events, or spikes. When time-encoded data is processed at a
remote site with respect to the location it is produced, the occurrence of
events needs to be encoded and transmitted in a timely fashion. The standard
Address-Event Representation (AER) protocol for neuromorphic chips encodes the
indices of the "spiking" traces in the payload of a packet produced at the same
time the events are recorded, hence implicitly encoding the events' timing in
the timing of the packet. This paper investigates the potential bandwidth
saving that can be obtained by carrying out variable-length compression of
packets' payloads. Compression leverages both intra-trace and inter-trace
correlations over time that are typical in applications such as social networks
or neuromorphic computing. The approach is based on discrete-time Hawkes
processes and entropy coding with conditional codebooks. Results from an
experiment based on a real-world retweet dataset are also provided.Comment: submitte
An Information-Theoretic Analysis of Deduplication
Deduplication finds and removes long-range data duplicates. It is commonly
used in cloud and enterprise server settings and has been successfully applied
to primary, backup, and archival storage. Despite its practical importance as a
source-coding technique, its analysis from the point of view of information
theory is missing. This paper provides such an information-theoretic analysis
of data deduplication. It introduces a new source model adapted to the
deduplication setting. It formalizes the two standard fixed-length and
variable-length deduplication schemes, and it introduces a novel multi-chunk
deduplication scheme. It then provides an analysis of these three deduplication
variants, emphasizing the importance of boundary synchronization between source
blocks and deduplication chunks. In particular, under fairly mild assumptions,
the proposed multi-chunk deduplication scheme is shown to be order optimal.Comment: 27 page
Real-time data compression of broadcast video signals
A non-adaptive predictor, a nonuniform quantizer, and a multi-level Huffman coder are incorporated into a differential pulse code modulation system for coding and decoding broadcast video signals in real time
Artificial Sequences and Complexity Measures
In this paper we exploit concepts of information theory to address the
fundamental problem of identifying and defining the most suitable tools to
extract, in a automatic and agnostic way, information from a generic string of
characters. We introduce in particular a class of methods which use in a
crucial way data compression techniques in order to define a measure of
remoteness and distance between pairs of sequences of characters (e.g. texts)
based on their relative information content. We also discuss in detail how
specific features of data compression techniques could be used to introduce the
notion of dictionary of a given sequence and of Artificial Text and we show how
these new tools can be used for information extraction purposes. We point out
the versatility and generality of our method that applies to any kind of
corpora of character strings independently of the type of coding behind them.
We consider as a case study linguistic motivated problems and we present
results for automatic language recognition, authorship attribution and self
consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression
approach to Information Extraction and Classification" by A. Baronchelli and
V. Loreto. 15 pages; 5 figure
- …