1,354 research outputs found

    Finding Skewed Subcubes Under a Distribution

    Get PDF
    Say that we are given samples from a distribution ? over an n-dimensional space. We expect or desire ? to behave like a product distribution (or a k-wise independent distribution over its marginals for small k). We propose the problem of enumerating/list-decoding all large subcubes where the distribution ? deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in ?. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to ? where finding skewed subcubes might be easier

    List decoding Reed-Muller codes over small fields

    Full text link
    The list decoding problem for a code asks for the maximal radius up to which any ball of that radius contains only a constant number of codewords. The list decoding radius is not well understood even for well studied codes, like Reed-Solomon or Reed-Muller codes. Fix a finite field F\mathbb{F}. The Reed-Muller code RMF(n,d)\mathrm{RM}_{\mathbb{F}}(n,d) is defined by nn-variate degree-dd polynomials over F\mathbb{F}. In this work, we study the list decoding radius of Reed-Muller codes over a constant prime field F=Fp\mathbb{F}=\mathbb{F}_p, constant degree dd and large nn. We show that the list decoding radius is equal to the minimal distance of the code. That is, if we denote by δ(d)\delta(d) the normalized minimal distance of RMF(n,d)\mathrm{RM}_{\mathbb{F}}(n,d), then the number of codewords in any ball of radius δ(d)ε\delta(d)-\varepsilon is bounded by c=c(p,d,ε)c=c(p,d,\varepsilon) independent of nn. This resolves a conjecture of Gopalan-Klivans-Zuckerman [STOC 2008], who among other results proved it in the special case of F=F2\mathbb{F}=\mathbb{F}_2; and extends the work of Gopalan [FOCS 2010] who proved the conjecture in the case of d=2d=2. We also analyse the number of codewords in balls of radius exceeding the minimal distance of the code. For ede \leq d, we show that the number of codewords of RMF(n,d)\mathrm{RM}_{\mathbb{F}}(n,d) in a ball of radius δ(e)ε\delta(e) - \varepsilon is bounded by exp(cnde)\exp(c \cdot n^{d-e}), where c=c(p,d,ε)c=c(p,d,\varepsilon) is independent of nn. The dependence on nn is tight. This extends the work of Kaufman-Lovett-Porat [IEEE Inf. Theory 2012] who proved similar bounds over F2\mathbb{F}_2. The proof relies on several new ingredients: an extension of the Frieze-Kannan weak regularity to general function spaces, higher-order Fourier analysis, and an extension of the Schwartz-Zippel lemma to compositions of polynomials.Comment: fixed a bug in the proof of claim 5.6 (now lemma 5.5

    Low Density Lattice Codes

    Full text link
    Low density lattice codes (LDLC) are novel lattice codes that can be decoded efficiently and approach the capacity of the additive white Gaussian noise (AWGN) channel. In LDLC a codeword x is generated directly at the n-dimensional Euclidean space as a linear transformation of a corresponding integer message vector b, i.e., x = Gb, where H, the inverse of G, is restricted to be sparse. The fact that H is sparse is utilized to develop a linear-time iterative decoding scheme which attains, as demonstrated by simulations, good error performance within ~0.5dB from capacity at block length of n = 100,000 symbols. The paper also discusses convergence results and implementation considerations.Comment: 24 pages, 4 figures. Submitted for publication in IEEE transactions on Information Theor
    corecore