453 research outputs found

    Learning Non-Parametric and High-Dimensional Distributions via Information-Theoretic Methods

    Get PDF
    Learning distributions that govern generation of data and estimation of related functionals are the foundations of many classical statistical problems. In the following dissertation we intend to investigate such topics when either the hypothesized model is non-parametric or the number of free parameters in the model grows along with the sample size. Especially, we study the above scenarios for the following class of problems with the goal of obtaining minimax rate-optimal methods for learning the target distributions when the sample size is finite. Our techniques are based on information-theoretic divergences and related mutual-information based methods. (i) Estimation in compound decision and empirical Bayes settings: To estimate the data-generating distribution, one often takes the following two-step approach. In the first step the statistician estimates the distribution of the parameters, either the empirical distribution or the postulated prior, and then in the second step plugs in the estimate to approximate the target of interest. In the literature, the estimation of empirical distribution is known as the compound decision problem and the estimation of prior is known as the problem of empirical Bayes. In our work we use the method of minimum-distance estimation for approximating these distributions. Considering certain discrete data setups, we show that the minimum-distance based method provides theoretically and practically sound choices for estimation. The computational and algorithmic aspects of the estimators are also analyzed. (ii) Prediction with Markov chains: Given observations from an unknown Markov chain, we study the problem of predicting the next entry in the trajectory. Existing analysis for such a dependent setup usually centers around concentration inequalities that uses various extraneous conditions on the mixing properties. This makes it difficult to achieve results independent of such restrictions. We introduce information-theoretic techniques to bypass such issues and obtain fundamental limits for the related minimax problems. We also analyze conditions on the mixing properties that produce a parametric rate of prediction errors

    Design, Implementation and Experiments for Moving Target Defense Framework

    Get PDF
    The traditional defensive security strategy for distributed systems employs well-established defensive techniques such as; redundancy/replications, firewalls, and encryption to prevent attackers from taking control of the system. However, given sufficient time and resources, all these methods can be defeated, especially when dealing with sophisticated attacks from advanced adversaries that leverage zero-day exploits

    Online failure prediction in air traffic control systems

    Get PDF
    This thesis introduces a novel approach to online failure prediction for mission critical distributed systems that has the distinctive features to be black-box, non-intrusive and online. The approach combines Complex Event Processing (CEP) and Hidden Markov Models (HMM) so as to analyze symptoms of failures that might occur in the form of anomalous conditions of performance metrics identified for such purpose. The thesis presents an architecture named CASPER, based on CEP and HMM, that relies on sniffed information from the communication network of a mission critical system, only, for predicting anomalies that can lead to software failures. An instance of Casper has been implemented, trained and tuned to monitor a real Air Traffic Control (ATC) system developed by Selex ES, a Finmeccanica Company. An extensive experimental evaluation of CASPER is presented. The obtained results show (i) a very low percentage of false positives over both normal and under stress conditions, and (ii) a sufficiently high failure prediction time that allows the system to apply appropriate recovery procedures

    Coding against synchronisation and related errors

    Get PDF
    In this thesis, we study aspects of coding against synchronisation errors, such as deletions and replications, and related errors. Synchronisation errors are a source of fundamental open problems in information theory, because they introduce correlations between output symbols even when input symbols are independently distributed. We focus on random errors, and consider two complementary problems: We study the optimal rate of reliable information transmission through channels with synchronisation and related errors (the channel capacity). Unlike simpler error models, the capacity of such channels is unknown. We first consider the geometric sticky channel, which replicates input bits according to a geometric distribution. Previously, bounds on its capacity were known only via numerical methods, which do not aid our conceptual understanding of this quantity. We derive sharp analytical capacity upper bounds which approach, and sometimes surpass, numerical bounds. This opens the door to a mathematical treatment of its capacity. We consider also the geometric deletion channel, combining deletions and geometric replications. We derive analytical capacity upper bounds, and notably prove that the capacity is bounded away from the maximum when the deletion probability is small, meaning that this channel behaves differently than related well-studied channels in this regime. Finally, we adapt techniques developed to handle synchronisation errors to derive improved upper bounds and structural results on the capacity of the discrete-time Poisson channel, a model of optical communication. Motivated by portable DNA-based storage and trace reconstruction, we introduce and study the coded trace reconstruction problem, where the goal is to design efficiently encodable high-rate codes whose codewords can be efficiently reconstructed from few reads corrupted by deletions. Remarkably, we design such n-bit codes with rate 1-O(1/log n) that require exponentially fewer reads than average-case trace reconstruction algorithms.Open Acces
    • …
    corecore