16 research outputs found

    New Algorithms and Lower Bounds for Sequential-Access Data Compression

    Get PDF
    This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

    Lempel-Ziv-like Parsing in Small Space

    Full text link
    Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the input text during the construction of the parsing, which is prohibitive for long inputs. An alternative is Relative Lempel-Ziv (RLZ), which indexes only a fixed reference sequence, whose size can be controlled. Deriving the reference sequence by sampling the text yields reasonable compression ratios for RLZ, but performance is not always competitive with that of LZ and depends heavily on the similarity of the reference to the text. In this paper we introduce ReLZ, a technique that uses RLZ as a preprocessor to approximate the LZ parsing using little memory. RLZ is first used to produce a sequence of phrases, and these are regarded as metasymbols that are input to LZ for a second-level parsing on a (most often) drastically shorter sequence. This parsing is finally translated into one on the original sequence. We analyze the new scheme and prove that, like LZ, it achieves the kkth order empirical entropy compression nHk+o(nlogσ)n H_k + o(n\log\sigma) with k=o(logσn)k = o(\log_\sigma n), where nn is the input length and σ\sigma is the alphabet size. In fact, we prove this entropy bound not only for ReLZ but for a wide class of LZ-like encodings. Then, we establish a lower bound on ReLZ approximation ratio showing that the number of phrases in it can be Ω(logn)\Omega(\log n) times larger than the number of phrases in LZ. Our experiments show that ReLZ is faster than existing alternatives to compute the (exact or approximate) LZ parsing, at the reasonable price of an approximation factor below 2.02.0 in all tested scenarios, and sometimes below 1.051.05, to the size of LZ.Comment: 21 pages, 6 figures, 2 table

    A fuzzy approach to similarity in Case-Based Reasoning suitable to SQL implementation

    Get PDF
    The aim of this paper is to formally introduce a notion of acceptance and similarity, based on fuzzy logic, among case features in a case retrieval system. This is pursued by rst reviewing the relationships between distance-based similarity (i.e. the standard approach in CBR) and fuzzy-based similarity, with particular attention to the formalization of a case retrieval process based on fuzzy query specication. In particular, we present an approach where local acceptance relative to a feature can be expressed through fuzzy distributions on its domain, abstracting the actual values to linguistic terms. Furthermore, global acceptance is completely grounded on fuzzy logic, by means of the usual combinations of local distributions through specic dened norms. We propose a retrieval architecture, based on the above notions and realized through a fuzzy extension of SQL, directly implemented on a standard relational DBMS. The advantage of this approach is that the whole power of an SQL engine can be fully exploited, with no need of implementing specic retrieval algorithms. The approach is illustrated by means of some examples from a recommender system called MyWine, aimed at recommending the suitable wine bottles to a customer providing her requirements in both crisp and fuzzy way

    SAN models of communication scenarios inside the Electrical Power System

    Get PDF
    This report provides all the details about the models and the quantitative results presented in [1], about the simulation of communication scenarios inside the Electrical Power System. In particular, the scenarios deal with the communication between one area control centre and a set of substations in a distribution grid, exchanging commands and signals by means of a redundant communication network. The communication may be affected by threats such as the communication network failure, or intrusions into the communication, causing the loss of commands or signals. The scenarios have been modeled and simulated in form of Stochastic Activity Networks, with the purpose of evaluating the effects of such threats on the communication reliability

    A GSPN semantics for Continuous Time Bayesian Networks with Immediate Nodes

    Get PDF
    In this report we present an extension to Continuous Time Bayesian Networks (CTBN) called Generalized Continuous Time Bayesian Networks (GCTBN). The formalism allows one to model, in addition to continuous time delayed variables (with exponentially distributed transition rates), also non delayed or "immediate" variables, which act as standard chance nodes in a Bayesian Network. This allows the modeling of processes having both a continuous-time temporal component and an immediate (i.e. non-delayed) component capturing the logical/probabilistic interactions among the model\u2019s variables. The usefulness of this kind of model is discussed through an example concerning the reliability of a simple component-based system. A semantic model of GCTBNs, based on the formalism of Generalized Stochastic Petri Nets (GSPN) is outlined, whose purpose is twofold: to provide a well-de\ufb01ned semantics for GCTBNs in terms of the underlying stochastic process, and to provide an actual mean to perform inference (both prediction and smoothing) on GCTBNs. The example case study is then used, in order to highlight the exploitation of GSPN analysis for posterior probability computation on the GCTBN model

    Non deterministic Repairable Fault Trees for computing optimal repair strategy

    Get PDF
    In this paper, the Non deterministic Repairable Fault Tree (NdRFT) formalism is proposed: it allows to model failure modes of complex systems as well as their repair processes. The originality of this formalism with respect to other Fault Tree extensions is that it allows to face repair strategies optimization problems: in an NdRFT model, the decision on whether to start or not a given repair action is non deterministic, so that all the possibilities are left open. The formalism is rather powerful allowing to specify which failure events are observable, whether local repair or global repair can be applied, and the resources needed to start a repair action. The optimal repair strategy can then be computed by solving an optimization problem on a Markov Decision Process (MDP) derived from the NdRFT. A software framework is proposed in order to perform in automatic way the derivation of an MDP from a NdRFT model, and to deal with the solution of the MDP
    corecore