17,051 research outputs found

    Learning Poisson Binomial Distributions

    Get PDF
    We consider a basic problem in unsupervised learning: learning an unknown \emph{Poisson Binomial Distribution}. A Poisson Binomial Distribution (PBD) over {0,1,,n}\{0,1,\dots,n\} is the distribution of a sum of nn independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by S. Poisson in 1837 \cite{Poisson:37} and are a natural nn-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our first main result we give a highly efficient algorithm which learns to \eps-accuracy (with respect to the total variation distance) using \tilde{O}(1/\eps^3) samples \emph{independent of nn}. The running time of the algorithm is \emph{quasilinear} in the size of its input data, i.e., \tilde{O}(\log(n)/\eps^3) bit-operations. (Observe that each draw from the distribution is a log(n)\log(n)-bit string.) Our second main result is a {\em proper} learning algorithm that learns to \eps-accuracy using \tilde{O}(1/\eps^2) samples, and runs in time (1/\eps)^{\poly (\log (1/\eps))} \cdot \log n. This is nearly optimal, since any algorithm {for this problem} must use \Omega(1/\eps^2) samples. We also give positive and negative results for some extensions of this learning problem to weighted sums of independent Bernoulli random variables.Comment: Revised full version. Improved sample complexity bound of O~(1/eps^2

    On Graphical Models via Univariate Exponential Family Distributions

    Full text link
    Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.Comment: Journal of Machine Learning Researc

    Learning Poisson Binomial Distributions with Differential Privacy

    Get PDF
    Στη διπλωματική αυτή προσπαθούμε να ενοποιήσουμε δύο ερευνητικά πεδία. Το πρώτο πεδίο αφορά το Distribution Learning ενώ το δεύτερο το Differnetial Privacy. Πιο συγκεκριμένα, δοθέντος ενός learning αλγορίθμου ο οποίος μαθαίνει με ε-accuracy μια Poisson διωνυμική κατανομή προσπαθούμε να βρούμε αν ο αλγόριθμος είναι Differential Private. Δείχνουμε ότι ο αλγόριθμος πετυχαίνει Differential Privacy κάτω από συγκεκριμένες υποθέσεις. Άν η κατανομή είναι κοντά σε μια (n,k) Διωνυμική κατανομή τότε ο αλγόριθμος παραμένει Differential Private. Άν η κατανομή είναι κοντά σε μια κ-Sparse μορφή τότε η ιδιότητα του Differential Privacy εξαρτάται από το πλήθος των στοιχείων του αλγορίθμου.This thesis tries to leverage two major research areas. The first area concerns the Distribution Learning area and the second the Differential Privacy. More specific, given a highly efficient algorithm which learns with ε-accuracy a Poisson Binomial distribution we try to study its Differential Privacy property. We show that if the algorithm is close to a (n,k)-Binomial form the algorithm is differential private. If the PBD is close to a k-Sparse form the algorithm's privacy depends on PBD cardinalit
    corecore