3,222 research outputs found

    Agnostic Learning by Refuting

    Full text link
    The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning. We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of \emph{efficient} agnostic learning. Informally, refutation complexity of a class C\mathcal{C} is the minimum number of example-label pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of C\mathcal{C} (\emph{structure}) and the case where the labels are i.i.d. Rademacher random variables (\emph{noise}). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and co-authors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence. In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAC-learning in the realizable (i.e. noiseless) case

    Better Agnostic Clustering Via Relaxed Tensor Norms

    Full text link
    We develop a new family of convex relaxations for kk-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms. We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emph{Poincare inequality}. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions. As an immediate corollary, for any γ>0\gamma > 0, we obtain an efficient algorithm for learning the means of a mixture of kk arbitrary \Poincare distributions in Rd\mathbb{R}^d in time dO(1/γ)d^{O(1/\gamma)} so long as the means have separation Ω(kγ)\Omega(k^{\gamma}). This in particular yields an algorithm for learning Gaussian mixtures with separation Ω(kγ)\Omega(k^{\gamma}), thus partially resolving an open problem of Regev and Vijayaraghavan \citet{regev2017learning}. Our algorithm works even in the outlier-robust setting where an ϵ\epsilon fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted

    Efficient Algorithms for Outlier-Robust Regression

    Full text link
    We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind and were not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.Comment: 27 pages. Appeared in COLT 2018. This update removes Lemma 6.2 that erroneously claimed an information-theoretic lower bound on error rate as a function of fraction of outlier

    Surprise in Elections

    Full text link
    Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build estimates of the distribution of preferences of the whole population based on their local neighborhoods. The outcome of the election leads to a surprise if these local estimates contradict the outcome of the election for some fixed voting rule. To get a quantitative understanding, we propose a simple mathematical model of the setting where the individuals in the population and their connections (through geographical proximity, social networks etc.) are described by a random graph with connection probabilities that are biased based on the preferences of the individuals. Each individual also has some estimate of the bias in their connections. We show that the election outcome leads to a surprise if the discrepancy between the estimated bias and the true bias in the local connections exceeds a certain threshold, and confirm the phenomenon that surprising outcomes are associated only with {\em closely contested elections}. We compare standard voting rules based on their performance on surprise and show that they have different behavior for different parts of the population. It also hints at an impossibility that a single voting rule will be less surprising for {\em all} parts of a population. Finally, we experiment with the UK-EU referendum (a.k.a.\ Brexit) dataset that attest some of our theoretical predictions.Comment: 18 pages, 6 figure

    An Analysis of the t-SNE Algorithm for Data Visualization

    Full text link
    A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications. This work gives a formal framework for the problem of data visualization - finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the "ground-truth" clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations. We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.Comment: In Conference on Learning Theory (COLT) 201

    Cosmological Power Spectrum in Non-commutative Space-time

    Full text link
    We propose a generalized star product which deviates from the standard product when the fields at evaluated at different space-time points. This produces no changes in the standard Lagrangian density in non-commutative space-time but produces a change in the cosmological power spectrum. We show that the generalized star product leads to physically consistent results and can fit the observed data on hemispherical anisotropy in the cosmic microwave background radiation.Comment: 5 pages, no figures, major change

    Proton albedo spectrum observation in low latitude region at Hyderabad, India

    Get PDF
    The flux and the energy spectrum of low energy (30-100 MeV) proton albedos, have been observed for the first time in a low latitude region, over Hyderabad, India. The preliminary results, based on the quick look data acquisition and display system are presented. A charged particle telescope, capable of distinguishing singly charged particles such as electrons, muons, protons in low energy region, records the data of both upward as well as downward moving particles. Thus spectra of splash and re-entrant albedo protons have been recorded simultaneously in a high altitude Balloon flight carried out on 8th December, 1985, over Hyderabad, India. Balloon floated at an latitude of approx. 37 km (4 mb)

    List-Decodable Linear Regression

    Full text link
    We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/21/2 fraction of examples. For any α<1\alpha < 1, our algorithm takes as input a sample {(xi,yi)}in\{(x_i,y_i)\}_{i \leq n} of nn linear equations where αn\alpha n of the equations satisfy yi=xi,+ζy_i = \langle x_i,\ell^*\rangle +\zeta for some small noise ζ\zeta and (1α)n(1-\alpha)n of the equations are {\em arbitrarily} chosen. It outputs a list LL of size O(1/α)O(1/\alpha) - a fixed constant - that contains an \ell that is close to \ell^*. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution DD. In particular, this gives a (d/α)O(1/α8)(d/\alpha)^{O(1/\alpha^8)} time algorithm to find a O(1/α)O(1/\alpha) size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that \ell^* has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.Comment: 28 Page

    SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four

    Full text link
    The problem of finding large cliques in random graphs and its "planted" variant, where one wants to recover a clique of size ωlog(n)\omega \gg \log{(n)} added to an \Erdos-\Renyi graph GG(n,12)G \sim G(n,\frac{1}{2}), have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size ω=Ω(n)\omega = \Omega(\sqrt{n}). By contrast, information theoretically, one can recover planted cliques so long as ωlog(n)\omega \gg \log{(n)}. In this work, we continue the investigation of algorithms from the sum of squares hierarchy for solving the planted clique problem begun by Meka, Potechin, and Wigderson (MPW, 2015) and Deshpande and Montanari (DM,2015). Our main results improve upon both these previous works by showing: 1. Degree four SoS does not recover the planted clique unless ωnpolylogn\omega \gg \sqrt n poly \log n, improving upon the bound ωn1/3\omega \gg n^{1/3} due to DM. A similar result was obtained independently by Raghavendra and Schramm (2015). 2. For 2<d=o(log(n))2 < d = o(\sqrt{\log{(n)}}), degree 2d2d SoS does not recover the planted clique unless ωn1/(d+1)/(2dpolylogn)\omega \gg n^{1/(d + 1)} /(2^d poly \log n), improving upon the bound due to MPW. Our proof for the second result is based on a fine spectral analysis of the certificate used in the prior works MPW,DM and Feige and Krauthgamer (2003) by decomposing it along an appropriately chosen basis. Along the way, we develop combinatorial tools to analyze the spectrum of random matrices with dependent entries and to understand the symmetries in the eigenspaces of the set symmetric matrices inspired by work of Grigoriev (2001). An argument of Kelner shows that the first result cannot be proved using the same certificate. Rather, our proof involves constructing and analyzing a new certificate that yields the nearly tight lower bound by "correcting" the certificate of previous works.Comment: 67 pages, 2 figure

    Imprint of Inhomogeneous and Anisotropic Primordial Power Spectrum on CMB Polarization

    Full text link
    We consider an inhomogeneous model and independently an anisotropic model of primordial power spectrum in order to describe the observed hemispherical anisotropy in Cosmic Microwave Background Radiation. This anisotropy can be parametrized in terms of the dipole modulation model of the temperature field. Both the models lead to correlations between spherical harmonic coefficients corresponding to multipoles, l and l \pm 1. We obtain the model parameters by making a fit to TT correlations in CMBR data. Using these parameters we predict the signature of our models for correlations among different multipoles for the case of the TE and EE modes. These predictions can be used to test whether the observed hemispherical anisotropy can be correctly described in terms of a primordial power spectrum. Furthermore these may also allow us to distinguish between an inhomogeneous and an anisotropic model.Comment: 9 pages, 5 figure
    corecore