3,814 research outputs found
A Combinatorial Approach to Robust PCA
We study the problem of recovering Gaussian data under adversarial
corruptions when the noises are low-rank and the corruptions are on the
coordinate level. Concretely, we assume that the Gaussian noises lie in an
unknown -dimensional subspace , and randomly
chosen coordinates of each data point fall into the control of an adversary.
This setting models the scenario of learning from high-dimensional yet
structured data that are transmitted through a highly-noisy channel, so that
the data points are unlikely to be entirely clean.
Our main result is an efficient algorithm that, when , recovers
every single data point up to a nearly-optimal error of in expectation. At the core of our proof is a new analysis of the
well-known Basis Pursuit (BP) method for recovering a sparse signal, which is
known to succeed under additional assumptions (e.g., incoherence or the
restricted isometry property) on the underlying subspace . In contrast, we
present a novel approach via studying a natural combinatorial problem and show
that, over the randomness in the support of the sparse signal, a
high-probability error bound is possible even if the subspace is arbitrary.Comment: To appear at ITCS 202
Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models
We study the problem of learning generalized linear models under adversarial
corruptions. We analyze a classical heuristic called the iterative trimmed
maximum likelihood estimator which is known to be effective against label
corruptions in practice. Under label corruptions, we prove that this simple
estimator achieves minimax near-optimal risk on a wide range of generalized
linear models, including Gaussian regression, Poisson regression and Binomial
regression. Finally, we extend the estimator to the more challenging setting of
label and covariate corruptions and demonstrate its robustness and optimality
in that setting as well
Transformers can optimally learn regression mixture models
Mixture models arise in many regression problems, but most methods have seen
limited adoption partly due to these algorithms' highly-tailored and
model-specific nature. On the other hand, transformers are flexible, neural
sequence models that present the intriguing possibility of providing
general-purpose prediction methods, even in this mixture setting. In this work,
we investigate the hypothesis that transformers can learn an optimal predictor
for mixtures of regressions. We construct a generative process for a mixture of
linear regressions for which the decision-theoretic optimal procedure is given
by data-driven exponential weights on a finite set of parameters. We observe
that transformers achieve low mean-squared error on data generated via this
process. By probing the transformer's output at inference time, we also show
that transformers typically make predictions that are close to the optimal
predictor. Our experiments also demonstrate that transformers can learn
mixtures of regressions in a sample-efficient fashion and are somewhat robust
to distribution shifts. We complement our experimental observations by proving
constructively that the decision-theoretic optimal procedure is indeed
implementable by a transformer.Comment: 24 pages, 9 figure
Linear Regression using Heterogeneous Data Batches
In many learning applications, data are collected from multiple sources, each
providing a \emph{batch} of samples that by itself is insufficient to learn its
input-output relationship. A common approach assumes that the sources fall in
one of several unknown subgroups, each with an unknown input distribution and
input-output relationship. We consider one of this setup's most fundamental and
important manifestations where the output is a noisy linear combination of the
inputs, and there are subgroups, each with its own regression vector. Prior
work~\cite{kong2020meta} showed that with abundant small-batches, the
regression vectors can be learned with only few, ,
batches of medium-size with samples each. However, the
paper requires that the input distribution for all subgroups be isotropic
Gaussian, and states that removing this assumption is an ``interesting and
challenging problem". We propose a novel gradient-based algorithm that improves
on the existing results in several ways. It extends the applicability of the
algorithm by: (1) allowing the subgroups' underlying input distributions to be
different, unknown, and heavy-tailed; (2) recovering all subgroups followed by
a significant proportion of batches even for infinite ; (3) removing the
separation requirement between the regression vectors; (4) reducing the number
of batches and allowing smaller batch sizes
Long-term Forecasting with TiDE: Time-series Dense Encoder
Recent work has shown that simple linear models can outperform several
Transformer based approaches in long term time-series forecasting. Motivated by
this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model,
Time-series Dense Encoder (TiDE), for long-term time-series forecasting that
enjoys the simplicity and speed of linear models while also being able to
handle covariates and non-linear dependencies. Theoretically, we prove that the
simplest linear analogue of our model can achieve near optimal error rate for
linear dynamical systems (LDS) under some assumptions. Empirically, we show
that our method can match or outperform prior approaches on popular long-term
time-series forecasting benchmarks while being 5-10x faster than the best
Transformer based model
Asymptotic results for fitting semiparametric transformation models to failure time data from case-cohort studies
Semiparametric transformation models are considered for failure time data from case-cohort studies, where the covariates are assembled only for a ran-domly selected subcohort from the entire cohort and additional cases outside the subcohort. We present the estimating procedures for the regression parameters and survival probability. The asymptotic properties of the resulting estimators are developed based on asymptotic results for U-statistics, martingales, stochastic processes and finite population sampling
Affine equivariant rank-weighted L-estimation of multivariate location
In the multivariate one-sample location model, we propose a class of flexible
robust, affine-equivariant L-estimators of location, for distributions invoking
affine-invariance of Mahalanobis distances of individual observations. An
involved iteration process for their computation is numerically illustrated.Comment: 16 pages, 4 figures, 6 table
Phenomenology of a three-family model with gauge symmetry SU(3)_c X SU(4)_L X U(1)_X
We study an extension of the gauge group SU(3)_c X SU(2)_L X U(1)_Y of the
standard model to the symmetry group SU(3)_c X SU(4)_L X U(1)_X (3-4-1 for
short). This extension provides an interesting attempt to answer the question
of family replication in the sense that models for the electroweak interaction
can be constructed so that anomaly cancellation is achieved by an interplay
between generations, all of them under the condition that the number of
families must be divisible by the number of colours of SU(3)_c. This method of
anomaly cancellation requires a family of quarks transforming differently from
the other two, thus leading to tree-level flavour changing neutral currents
(FCNC) transmitted by the two extra neutral gauge bosons and
predicted by the model. In a version of the 3-4-1 extension, which does not
contain particles with exotic electric charges, we study the fermion mass
spectrum and some aspects of the phenomenology of the neutral gauge boson
sector. In particular, we impose limits on the mixing angle and on the
mass scale of the corresponding physical new neutral gauge boson , and
establish a lower bound on the mass of the additional new neutral gauge boson
. For the analysis we use updated precision electroweak data at
the Z-pole from the CERN LEP and SLAC Linear Collider, and atomic parity
violation data. The mass scale of the additional new neutral gauge boson
is constrained by using updated experimental inputs from neutral meson mixing
in the analysis of the sources of FCNC in the model. The data constrain the
mixing angle to a very small value of O(0.001), and the lower bounds on
and on are found to be of O(1 TeV) and of O(7 TeV),
repectively.Comment: 22 pages, 6 tables, 1 figure. To appear in J. Phys. G: Nuclear and
Particle Physic
- …