98 research outputs found

    Learning mixtures of structured distributions over discrete domains

    Full text link
    Let C\mathfrak{C} be a class of probability distributions over the discrete domain [n]={1,...,n}.[n] = \{1,...,n\}. We show that if C\mathfrak{C} satisfies a rather general condition -- essentially, that each distribution in C\mathfrak{C} can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of kk unknown distributions from C.\mathfrak{C}. We analyze several natural types of distributions over [n][n], including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.Comment: preliminary full version of soda'13 pape

    On Extracting Common Random Bits From Correlated Sources on Large Alphabets

    Get PDF
    Suppose Alice and Bob receive strings X=(X1,...,Xn) and Y=(Y1,...,Yn) each uniformly random in [s]n, but so that X and Y are correlated. For each symbol i, we have that Yi=Xi with probability 1-ε and otherwise Yi is chosen independently and uniformly from [s]. Alice and Bob wish to use their respective strings to extract a uniformly chosen common sequence from [s]k, but without communicating. How well can they do? The trivial strategy of outputting the first k symbols yields an agreement probability of (1-ε+ε/s)k. In a recent work by Bogdanov and Mossel, it was shown that in the binary case where s=2 and k=k(ε) is large enough then it is possible to extract k bits with a better agreement probability rate. In particular, it is possible to achieve agreement probability (kε)-1/2·2-kε/(2(1-ε/2)) using a random construction based on Hamming balls, and this is optimal up to lower order terms. In this paper, we consider the same problem over larger alphabet sizes s and we show that the agreement probability rate changes dramatically as the alphabet grows. In particular, we show no strategy can achieve agreement probability better than (1-ε)k(1+δ(s))k where δ(s)→ 0 as s→∞. We also show that Hamming ball-based constructions have much lower agreement probability rate than the trivial algorithm as s→∞. Our proofs and results are intimately related to subtle properties of hypercontractive inequalities

    Convergence, unanimity and disagreement in majority dynamics on unimodular graphs and random graphs

    Get PDF
    In majority dynamics, agents located at the vertices of an undirected simple graph update their binary opinions synchronously by adopting those of the majority of their neighbors. On infinite unimodular transitive graphs (e.g., Cayley graphs), when initial opinions are chosen from a distribution that is invariant with respect to the graph automorphism group, we show that the opinion of each agent almost surely either converges, or else eventually oscillates with period two; this is known to hold for finite graphs, but not for all infinite graphs. On Erdős-Rényi random graphs with degrees Ω(n√), we show that when initial opinions are chosen i.i.d. then agents all converge to the initial majority opinion, with constant probability. Conversely, on random 4-regular finite graphs, we show that with high probability different agents converge to different opinions

    Optimal Algorithms for Testing Closeness of Discrete Distributions

    Get PDF
    We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions pp and qq over an nn-element set, we wish to distinguish whether p=qp=q versus pp is at least \eps-far from qq, in either 1\ell_1 or 2\ell_2 distance. Batu et al. gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant up to a logarithmic factor in nn, and a polynomial factor of \eps. In this work, we present simple (and new) testers for both the 1\ell_1 and 2\ell_2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on nn, and the dependence on \eps; for the 1\ell_1 testing problem we establish that the sample complexity is $\Theta(\max\{n^{2/3}/\eps^{4/3}, n^{1/2}/\eps^2 \}).

    Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

    Get PDF
    Let pp be an unknown and arbitrary probability distribution over [0,1)[0,1). We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from pp and must (with high probability) output a hypothesis distribution that is close to pp. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any kk and ϵ\epsilon, we give an algorithm that makes O~(k/ϵ2)\tilde{O}(k/\epsilon^2) draws from pp, runs in O~(k/ϵ2)\tilde{O}(k/\epsilon^2) time, and outputs a hypothesis distribution hh that is piecewise constant with O(klog2(1/ϵ))O(k \log^2(1/\epsilon)) pieces. With high probability the hypothesis hh satisfies dTV(p,h)Coptk(p)+ϵd_{\mathrm{TV}}(p,h) \leq C \cdot \mathrm{opt}_k(p) + \epsilon, where dTVd_{\mathrm{TV}} denotes the total variation distance (statistical distance), CC is a universal constant, and optk(p)\mathrm{opt}_k(p) is the smallest total variation distance between pp and any kk-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The "approximation factor" CC in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of kk and ϵ\epsilon can achieve C<2C<2 regardless of what kind of hypothesis distribution it uses.Comment: conference version appears in NIPS 201

    Convergence, unanimity and disagreement in majority dynamics on unimodular graphs and random graphs

    Get PDF
    In majority dynamics, agents located at the vertices of an undirected simple graph update their binary opinions synchronously by adopting those of the majority of their neighbors. On infinite unimodular transitive graphs (e.g., Cayley graphs), when initial opinions are chosen from a distribution that is invariant with respect to the graph automorphism group, we show that the opinion of each agent almost surely either converges, or else eventually oscillates with period two; this is known to hold for finite graphs, but not for all infinite graphs. On Erdős-Rényi random graphs with degrees Ω(n√), we show that when initial opinions are chosen i.i.d. then agents all converge to the initial majority opinion, with constant probability. Conversely, on random 4-regular finite graphs, we show that with high probability different agents converge to different opinions

    The management of municipal solid waste in Hong Kong : a study of civic engagement strategies

    Get PDF
    published_or_final_versionPolitics and Public AdministrationMasterMaster of Public Administratio
    corecore