3,222 research outputs found
Agnostic Learning by Refuting
The sample complexity of learning a Boolean-valued function class is
precisely characterized by its Rademacher complexity. This has little bearing,
however, on the sample complexity of \emph{efficient} agnostic learning.
We introduce \emph{refutation complexity}, a natural computational analog of
Rademacher complexity of a Boolean concept class and show that it exactly
characterizes the sample complexity of \emph{efficient} agnostic learning.
Informally, refutation complexity of a class is the minimum
number of example-label pairs required to efficiently distinguish between the
case that the labels correlate with the evaluation of some member of
(\emph{structure}) and the case where the labels are i.i.d.
Rademacher random variables (\emph{noise}). The easy direction of this
relationship was implicitly used in the recent framework for improper PAC
learning lower bounds of Daniely and co-authors via connections to the hardness
of refuting random constraint satisfaction problems. Our work can be seen as
making the relationship between agnostic learning and refutation implicit in
their work into an explicit equivalence. In a recent, independent work, Salil
Vadhan discovered a similar relationship between refutation and PAC-learning in
the realizable (i.e. noiseless) case
Better Agnostic Clustering Via Relaxed Tensor Norms
We develop a new family of convex relaxations for -means clustering based
on sum-of-squares norms, a relaxation of the injective tensor norm that is
efficiently computable using the Sum-of-Squares algorithm. We give an algorithm
based on this relaxation that recovers a faithful approximation to the true
means in the given data whenever the low-degree moments of the points in each
cluster have bounded sum-of-squares norms.
We then prove a sharp upper bound on the sum-of-squares norms for moment
tensors of any distribution that satisfies the \emph{Poincare inequality}. The
Poincare inequality is a central inequality in probability theory, and a large
class of distributions satisfy it including Gaussians, product distributions,
strongly log-concave distributions, and any sum or uniformly continuous
transformation of such distributions.
As an immediate corollary, for any , we obtain an efficient
algorithm for learning the means of a mixture of arbitrary \Poincare
distributions in in time so long as the means
have separation . This in particular yields an algorithm
for learning Gaussian mixtures with separation , thus
partially resolving an open problem of Regev and Vijayaraghavan
\citet{regev2017learning}.
Our algorithm works even in the outlier-robust setting where an
fraction of arbitrary outliers are added to the data, as long as the fraction
of outliers is smaller than the smallest cluster. We, therefore, obtain results
in the strong agnostic setting where, in addition to not knowing the
distribution family, the data itself may be arbitrarily corrupted
Efficient Algorithms for Outlier-Robust Regression
We give the first polynomial-time algorithm for performing linear or
polynomial regression resilient to adversarial corruptions in both examples and
labels.
Given a sufficiently large (polynomial-size) training set drawn i.i.d. from
distribution D and subsequently corrupted on some fraction of points, our
algorithm outputs a linear function whose squared error is close to the squared
error of the best-fitting linear function with respect to D, assuming that the
marginal distribution of D over the input space is \emph{certifiably
hypercontractive}. This natural property is satisfied by many well-studied
distributions such as Gaussian, strongly log-concave distributions and, uniform
distribution on the hypercube among others. We also give a simple statistical
lower bound showing that some distributional assumption is necessary to succeed
in this setting.
These results are the first of their kind and were not known to be even
information-theoretically possible prior to our work.
Our approach is based on the sum-of-squares (SoS) method and is inspired by
the recent applications of the method for parameter recovery problems in
unsupervised learning. Our algorithm can be seen as a natural convex relaxation
of the following conceptually simple non-convex optimization problem: find a
linear function and a large subset of the input corrupted sample such that the
least squares loss of the function over the subset is minimized over all
possible large subsets.Comment: 27 pages. Appeared in COLT 2018. This update removes Lemma 6.2 that
erroneously claimed an information-theoretic lower bound on error rate as a
function of fraction of outlier
Surprise in Elections
Elections involving a very large voter population often lead to outcomes that
surprise many. This is particularly important for the elections in which
results affect the economy of a sizable population. A better prediction of the
true outcome helps reduce the surprise and keeps the voters prepared. This
paper starts from the basic observation that individuals in the underlying
population build estimates of the distribution of preferences of the whole
population based on their local neighborhoods. The outcome of the election
leads to a surprise if these local estimates contradict the outcome of the
election for some fixed voting rule. To get a quantitative understanding, we
propose a simple mathematical model of the setting where the individuals in the
population and their connections (through geographical proximity, social
networks etc.) are described by a random graph with connection probabilities
that are biased based on the preferences of the individuals. Each individual
also has some estimate of the bias in their connections.
We show that the election outcome leads to a surprise if the discrepancy
between the estimated bias and the true bias in the local connections exceeds a
certain threshold, and confirm the phenomenon that surprising outcomes are
associated only with {\em closely contested elections}. We compare standard
voting rules based on their performance on surprise and show that they have
different behavior for different parts of the population. It also hints at an
impossibility that a single voting rule will be less surprising for {\em all}
parts of a population. Finally, we experiment with the UK-EU referendum
(a.k.a.\ Brexit) dataset that attest some of our theoretical predictions.Comment: 18 pages, 6 figure
An Analysis of the t-SNE Algorithm for Data Visualization
A first line of attack in exploratory data analysis is data visualization,
i.e., generating a 2-dimensional representation of data that makes clusters of
similar points visually identifiable. Standard Johnson-Lindenstrauss
dimensionality reduction does not produce data visualizations. The t-SNE
heuristic of van der Maaten and Hinton, which is based on non-convex
optimization, has become the de facto standard for visualization in a wide
range of applications.
This work gives a formal framework for the problem of data visualization -
finding a 2-dimensional embedding of clusterable data that correctly separates
individual clusters to make them visually identifiable. We then give a rigorous
analysis of the performance of t-SNE under a natural, deterministic condition
on the "ground-truth" clusters (similar to conditions assumed in earlier
analyses of clustering) in the underlying data. These are the first provable
guarantees on t-SNE for constructing good data visualizations.
We show that our deterministic condition is satisfied by considerably general
probabilistic generative models for clusterable data such as mixtures of
well-separated log-concave distributions. Finally, we give theoretical evidence
that t-SNE provably succeeds in partially recovering cluster structure even
when the above deterministic condition is not met.Comment: In Conference on Learning Theory (COLT) 201
Cosmological Power Spectrum in Non-commutative Space-time
We propose a generalized star product which deviates from the standard
product when the fields at evaluated at different space-time points. This
produces no changes in the standard Lagrangian density in non-commutative
space-time but produces a change in the cosmological power spectrum. We show
that the generalized star product leads to physically consistent results and
can fit the observed data on hemispherical anisotropy in the cosmic microwave
background radiation.Comment: 5 pages, no figures, major change
Proton albedo spectrum observation in low latitude region at Hyderabad, India
The flux and the energy spectrum of low energy (30-100 MeV) proton albedos, have been observed for the first time in a low latitude region, over Hyderabad, India. The preliminary results, based on the quick look data acquisition and display system are presented. A charged particle telescope, capable of distinguishing singly charged particles such as electrons, muons, protons in low energy region, records the data of both upward as well as downward moving particles. Thus spectra of splash and re-entrant albedo protons have been recorded simultaneously in a high altitude Balloon flight carried out on 8th December, 1985, over Hyderabad, India. Balloon floated at an latitude of approx. 37 km (4 mb)
List-Decodable Linear Regression
We give the first polynomial-time algorithm for robust regression in the
list-decodable setting where an adversary can corrupt a greater than
fraction of examples.
For any , our algorithm takes as input a sample of linear equations where of the equations satisfy for some small noise and
of the equations are {\em arbitrarily} chosen. It outputs a list
of size - a fixed constant - that contains an that is
close to .
Our algorithm succeeds whenever the inliers are chosen from a
\emph{certifiably} anti-concentrated distribution . In particular, this
gives a time algorithm to find a
size list when the inlier distribution is standard Gaussian. For discrete
product distributions that are anti-concentrated only in \emph{regular}
directions, we give an algorithm that achieves similar guarantee under the
promise that has all coordinates of the same magnitude. To complement
our result, we prove that the anti-concentration assumption on the inliers is
information-theoretically necessary.
Our algorithm is based on a new framework for list-decodable learning that
strengthens the `identifiability to algorithms' paradigm based on the
sum-of-squares method.
In an independent and concurrent work, Raghavendra and Yau also used the
Sum-of-Squares method to give a similar result for list-decodable regression.Comment: 28 Page
SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four
The problem of finding large cliques in random graphs and its "planted"
variant, where one wants to recover a clique of size
added to an \Erdos-\Renyi graph , have been intensely
studied. Nevertheless, existing polynomial time algorithms can only recover
planted cliques of size . By contrast, information
theoretically, one can recover planted cliques so long as . In this work, we continue the investigation of algorithms from the
sum of squares hierarchy for solving the planted clique problem begun by Meka,
Potechin, and Wigderson (MPW, 2015) and Deshpande and Montanari (DM,2015). Our
main results improve upon both these previous works by showing:
1. Degree four SoS does not recover the planted clique unless , improving upon the bound due to DM.
A similar result was obtained independently by Raghavendra and Schramm (2015).
2. For , degree SoS does not recover the
planted clique unless , improving
upon the bound due to MPW.
Our proof for the second result is based on a fine spectral analysis of the
certificate used in the prior works MPW,DM and Feige and Krauthgamer (2003) by
decomposing it along an appropriately chosen basis. Along the way, we develop
combinatorial tools to analyze the spectrum of random matrices with dependent
entries and to understand the symmetries in the eigenspaces of the set
symmetric matrices inspired by work of Grigoriev (2001).
An argument of Kelner shows that the first result cannot be proved using the
same certificate. Rather, our proof involves constructing and analyzing a new
certificate that yields the nearly tight lower bound by "correcting" the
certificate of previous works.Comment: 67 pages, 2 figure
Imprint of Inhomogeneous and Anisotropic Primordial Power Spectrum on CMB Polarization
We consider an inhomogeneous model and independently an anisotropic model of
primordial power spectrum in order to describe the observed hemispherical
anisotropy in Cosmic Microwave Background Radiation. This anisotropy can be
parametrized in terms of the dipole modulation model of the temperature field.
Both the models lead to correlations between spherical harmonic coefficients
corresponding to multipoles, l and l \pm 1. We obtain the model parameters by
making a fit to TT correlations in CMBR data. Using these parameters we predict
the signature of our models for correlations among different multipoles for the
case of the TE and EE modes. These predictions can be used to test whether the
observed hemispherical anisotropy can be correctly described in terms of a
primordial power spectrum. Furthermore these may also allow us to distinguish
between an inhomogeneous and an anisotropic model.Comment: 9 pages, 5 figure
- …