3,814 research outputs found

    A Combinatorial Approach to Robust PCA

    Full text link
    We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown kk-dimensional subspace URdU \subseteq \mathbb{R}^d, and ss randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when ks2=O(d)ks^2 = O(d), recovers every single data point up to a nearly-optimal 1\ell_1 error of O~(ks/d)\tilde O(ks/d) in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace UU. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace UU is arbitrary.Comment: To appear at ITCS 202

    Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models

    Full text link
    We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the more challenging setting of label and covariate corruptions and demonstrate its robustness and optimality in that setting as well

    Transformers can optimally learn regression mixture models

    Full text link
    Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. We construct a generative process for a mixture of linear regressions for which the decision-theoretic optimal procedure is given by data-driven exponential weights on a finite set of parameters. We observe that transformers achieve low mean-squared error on data generated via this process. By probing the transformer's output at inference time, we also show that transformers typically make predictions that are close to the optimal predictor. Our experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts. We complement our experimental observations by proving constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.Comment: 24 pages, 9 figure

    Linear Regression using Heterogeneous Data Batches

    Full text link
    In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are kk subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, Ω~(k3/2)\tilde\Omega( k^{3/2}), batches of medium-size with Ω~(k)\tilde\Omega(\sqrt k) samples each. However, the paper requires that the input distribution for all kk subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite kk; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes

    Long-term Forecasting with TiDE: Time-series Dense Encoder

    Full text link
    Recent work has shown that simple linear models can outperform several Transformer based approaches in long term time-series forecasting. Motivated by this, we propose a Multi-layer Perceptron (MLP) based encoder-decoder model, Time-series Dense Encoder (TiDE), for long-term time-series forecasting that enjoys the simplicity and speed of linear models while also being able to handle covariates and non-linear dependencies. Theoretically, we prove that the simplest linear analogue of our model can achieve near optimal error rate for linear dynamical systems (LDS) under some assumptions. Empirically, we show that our method can match or outperform prior approaches on popular long-term time-series forecasting benchmarks while being 5-10x faster than the best Transformer based model

    Asymptotic results for fitting semiparametric transformation models to failure time data from case-cohort studies

    Get PDF
    Semiparametric transformation models are considered for failure time data from case-cohort studies, where the covariates are assembled only for a ran-domly selected subcohort from the entire cohort and additional cases outside the subcohort. We present the estimating procedures for the regression parameters and survival probability. The asymptotic properties of the resulting estimators are developed based on asymptotic results for U-statistics, martingales, stochastic processes and finite population sampling

    Affine equivariant rank-weighted L-estimation of multivariate location

    Full text link
    In the multivariate one-sample location model, we propose a class of flexible robust, affine-equivariant L-estimators of location, for distributions invoking affine-invariance of Mahalanobis distances of individual observations. An involved iteration process for their computation is numerically illustrated.Comment: 16 pages, 4 figures, 6 table

    Phenomenology of a three-family model with gauge symmetry SU(3)_c X SU(4)_L X U(1)_X

    Full text link
    We study an extension of the gauge group SU(3)_c X SU(2)_L X U(1)_Y of the standard model to the symmetry group SU(3)_c X SU(4)_L X U(1)_X (3-4-1 for short). This extension provides an interesting attempt to answer the question of family replication in the sense that models for the electroweak interaction can be constructed so that anomaly cancellation is achieved by an interplay between generations, all of them under the condition that the number of families must be divisible by the number of colours of SU(3)_c. This method of anomaly cancellation requires a family of quarks transforming differently from the other two, thus leading to tree-level flavour changing neutral currents (FCNC) transmitted by the two extra neutral gauge bosons ZZ' and ZZ'' predicted by the model. In a version of the 3-4-1 extension, which does not contain particles with exotic electric charges, we study the fermion mass spectrum and some aspects of the phenomenology of the neutral gauge boson sector. In particular, we impose limits on the ZZZ-Z' mixing angle and on the mass scale of the corresponding physical new neutral gauge boson Z2Z_2, and establish a lower bound on the mass of the additional new neutral gauge boson ZZ3Z'' \equiv Z_3. For the analysis we use updated precision electroweak data at the Z-pole from the CERN LEP and SLAC Linear Collider, and atomic parity violation data. The mass scale of the additional new neutral gauge boson Z3Z_3 is constrained by using updated experimental inputs from neutral meson mixing in the analysis of the sources of FCNC in the model. The data constrain the ZZZ-Z' mixing angle to a very small value of O(0.001), and the lower bounds on MZ2M_{Z_2} and on MZ3M_{Z_3} are found to be of O(1 TeV) and of O(7 TeV), repectively.Comment: 22 pages, 6 tables, 1 figure. To appear in J. Phys. G: Nuclear and Particle Physic
    corecore