16 research outputs found
New Algorithms and Lower Bounds for Sequential-Access Data Compression
This thesis concerns sequential-access data compression, i.e., by algorithms
that read the input one or more times from beginning to end. In one chapter we
consider adaptive prefix coding, for which we must read the input character by
character, outputting each character's self-delimiting codeword before reading
the next one. We show how to encode and decode each character in constant
worst-case time while producing an encoding whose length is worst-case optimal.
In another chapter we consider one-pass compression with memory bounded in
terms of the alphabet size and context length, and prove a nearly tight
tradeoff between the amount of memory we can use and the quality of the
compression we can achieve. In a third chapter we consider compression in the
read/write streams model, which allows us passes and memory both
polylogarithmic in the size of the input. We first show how to achieve
universal compression using only one pass over one stream. We then show that
one stream is not sufficient for achieving good grammar-based compression.
Finally, we show that two streams are necessary and sufficient for achieving
entropy-only bounds.Comment: draft of PhD thesi
Lempel-Ziv-like Parsing in Small Space
Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and
widely-used compressors for repetitive texts. However, the existing efficient
methods computing the exact LZ parsing have to use linear or close to linear
space to index the input text during the construction of the parsing, which is
prohibitive for long inputs. An alternative is Relative Lempel-Ziv (RLZ), which
indexes only a fixed reference sequence, whose size can be controlled. Deriving
the reference sequence by sampling the text yields reasonable compression
ratios for RLZ, but performance is not always competitive with that of LZ and
depends heavily on the similarity of the reference to the text. In this paper
we introduce ReLZ, a technique that uses RLZ as a preprocessor to approximate
the LZ parsing using little memory. RLZ is first used to produce a sequence of
phrases, and these are regarded as metasymbols that are input to LZ for a
second-level parsing on a (most often) drastically shorter sequence. This
parsing is finally translated into one on the original sequence.
We analyze the new scheme and prove that, like LZ, it achieves the th
order empirical entropy compression with , where is the input length and is the alphabet
size. In fact, we prove this entropy bound not only for ReLZ but for a wide
class of LZ-like encodings. Then, we establish a lower bound on ReLZ
approximation ratio showing that the number of phrases in it can be
times larger than the number of phrases in LZ. Our experiments
show that ReLZ is faster than existing alternatives to compute the (exact or
approximate) LZ parsing, at the reasonable price of an approximation factor
below in all tested scenarios, and sometimes below , to the size of
LZ.Comment: 21 pages, 6 figures, 2 table
A fuzzy approach to similarity in Case-Based Reasoning suitable to SQL implementation
The aim of this paper is to formally introduce a notion of acceptance and similarity,
based on fuzzy logic, among case features in a case retrieval system. This is pursued
by rst reviewing the relationships between distance-based similarity (i.e. the
standard approach in CBR) and fuzzy-based similarity, with particular attention
to the formalization of a case retrieval process based on fuzzy query specication.
In particular, we present an approach where local acceptance relative to a feature
can be expressed through fuzzy distributions on its domain, abstracting the actual
values to linguistic terms. Furthermore, global acceptance is completely grounded
on fuzzy logic, by means of the usual combinations of local distributions through
specic dened norms. We propose a retrieval architecture, based on the above notions
and realized through a fuzzy extension of SQL, directly implemented on a
standard relational DBMS. The advantage of this approach is that the whole power
of an SQL engine can be fully exploited, with no need of implementing specic
retrieval algorithms. The approach is illustrated by means of some examples from
a recommender system called MyWine, aimed at recommending the suitable wine
bottles to a customer providing her requirements in both crisp and fuzzy way
SAN models of communication scenarios inside the Electrical Power System
This report provides all the details about the models and the quantitative results presented in [1], about the simulation of communication scenarios inside the Electrical Power System. In particular, the scenarios deal with the communication between one area control centre and a set of substations in a distribution grid, exchanging commands and signals by means of a redundant communication network. The communication may be affected by threats such as the communication network failure, or intrusions into the communication, causing the loss of commands or signals. The scenarios have been modeled and simulated in form of Stochastic Activity Networks, with the purpose of evaluating the effects of such threats on the communication reliability
A GSPN semantics for Continuous Time Bayesian Networks with Immediate Nodes
In this report we present an extension to Continuous Time Bayesian Networks (CTBN) called Generalized Continuous Time Bayesian Networks (GCTBN). The formalism allows one to model, in addition to continuous time delayed variables (with exponentially distributed transition rates), also non delayed or "immediate" variables, which act as standard chance nodes in a Bayesian Network. This allows the modeling of processes having both a continuous-time temporal component and an immediate (i.e. non-delayed) component capturing the logical/probabilistic interactions among the model\u2019s variables. The usefulness of this kind of model is discussed through an example concerning the reliability of a simple component-based system. A semantic model of GCTBNs, based on the formalism of Generalized Stochastic Petri Nets (GSPN) is outlined, whose purpose is twofold: to provide a well-de\ufb01ned semantics for GCTBNs in terms of the underlying stochastic process, and to provide an actual mean to perform inference (both prediction and smoothing) on GCTBNs. The example case study is then used, in order to highlight the exploitation of GSPN analysis for posterior probability computation on the GCTBN model
Non deterministic Repairable Fault Trees for computing optimal repair strategy
In this paper, the Non deterministic Repairable Fault Tree (NdRFT) formalism is proposed: it allows to model failure modes of complex systems as well as their repair processes. The originality of this formalism
with respect to other Fault Tree extensions is that it allows to face repair strategies optimization problems: in an NdRFT model, the decision on whether to start or not a given repair action is non deterministic, so
that all the possibilities are left open. The formalism is rather powerful allowing to specify which failure events are observable, whether local repair or global repair can be applied, and the resources needed to start
a repair action. The optimal repair strategy can then be computed by solving an optimization problem on a Markov Decision Process (MDP) derived from the NdRFT. A software framework is proposed in order to perform in automatic way the derivation of an MDP from a NdRFT model, and to deal with the solution of the MDP