486,686 research outputs found

    MCMC Learning

    Full text link
    The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution.Comment: 28 pages, 1 figur

    Learning with the Weighted Trace-norm under Arbitrary Sampling Distributions

    Full text link
    We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted trace-norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial

    Efficient Learning of Linear Separators under Bounded Noise

    Full text link
    We study the learnability of linear separators in d\Re^d in the presence of bounded (a.k.a Massart) noise. This is a realistic generalization of the random classification noise model, where the adversary can flip each example xx with probability η(x)η\eta(x) \leq \eta. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit ball in d\Re^d, for some constant value of η\eta. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design algorithms achieving arbitrarily small excess error in polynomial time under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only a logarithmic the desired excess error ϵ\epsilon

    Embedding Hard Learning Problems Into Gaussian Space

    Get PDF
    We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n^Omega(log(1/epsilon)) time under the assumption that k-sparse LPN requires n^Omega(k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. 2013 who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent difficulty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two different domains and distributions

    Learning using Local Membership Queries

    Full text link
    We introduce a new model of membership query (MQ) learning, where the learning algorithm is restricted to query points that are \emph{close} to random examples drawn from the underlying distribution. The learning model is intermediate between the PAC model (Valiant, 1984) and the PAC+MQ model (where the queries are allowed to be arbitrary points). Membership query algorithms are not popular among machine learning practitioners. Apart from the obvious difficulty of adaptively querying labelers, it has also been observed that querying \emph{unnatural} points leads to increased noise from human labelers (Lang and Baum, 1992). This motivates our study of learning algorithms that make queries that are close to examples generated from the data distribution. We restrict our attention to functions defined on the nn-dimensional Boolean hypercube and say that a membership query is local if its Hamming distance from some example in the (random) training data is at most O(log(n))O(\log(n)). We show the following results in this model: (i) The class of sparse polynomials (with coefficients in R) over {0,1}n\{0,1\}^n is polynomial time learnable under a large class of \emph{locally smooth} distributions using O(log(n))O(\log(n))-local queries. This class also includes the class of O(log(n))O(\log(n))-depth decision trees. (ii) The class of polynomial-sized decision trees is polynomial time learnable under product distributions using O(log(n))O(\log(n))-local queries. (iii) The class of polynomial size DNF formulas is learnable under the uniform distribution using O(log(n))O(\log(n))-local queries in time nO(log(log(n)))n^{O(\log(\log(n)))}. (iv) In addition we prove a number of results relating the proposed model to the traditional PAC model and the PAC+MQ model
    corecore