8 research outputs found

    Linear Regression using Heterogeneous Data Batches

    Full text link
    In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are kk subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, Ω~(k3/2)\tilde\Omega( k^{3/2}), batches of medium-size with Ω~(k)\tilde\Omega(\sqrt k) samples each. However, the paper requires that the input distribution for all kk subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite kk; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes

    Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

    Full text link
    We consider high-dimensional least-squares regression when a fraction ϵ\epsilon of the labels are contaminated by an arbitrary adversary. We analyze such problem in the statistical learning framework with a subgaussian distribution and linear hypothesis class on the space of d1×d2d_1\times d_2 matrices. As such, we allow the noise to be heterogeneous. This framework includes sparse linear regression and low-rank trace-regression. For a pp-dimensional ss-sparse parameter, we show that a convex regularized MM-estimator using a sorted Huber-type loss achieves the near-optimal subgaussian rate slog(ep/s)+log(1/δ)/n+ϵlog(1/ϵ), \sqrt{s\log(ep/s)}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon), with probability at least 1δ1-\delta. For a (d1×d2)(d_1\times d_2)-dimensional parameter with rank rr, a nuclear-norm regularized MM-estimator using the same sorted Huber-type loss achieves the subgaussian rate rd1/n+rd2/n+log(1/δ)/n+ϵlog(1/ϵ), \sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon), again optimal up to a log factor. In a second part, we study the trace-regression problem when the parameter is the sum of a matrix with rank rr plus a ss-sparse matrix assuming the "low-spikeness" condition. Unlike multivariate regression studied in previous work, the design in trace-regression lacks positive-definiteness in high-dimensions. Still, we show that a regularized least-squares estimator achieves the subgaussian rate rd1/n+rd2/n+slog(d1d2)/n+log(1/δ)/n. \sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{s\log(d_1d_2)/n} +\sqrt{\log(1/\delta)/n}. Lastly, we consider noisy matrix completion with non-uniform sampling when a fraction ϵ\epsilon of the sampled low-rank matrix is corrupted by outliers. If only the low-rank matrix is of interest, we show that a nuclear-norm regularized Huber-type estimator achieves, up to log factors, the optimal rate adaptively to the corruption level. The above mentioned rates require no information on (s,r,ϵ)(s,r,\epsilon)

    On Tilted Losses in Machine Learning: Theory and Applications

    Full text link
    Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0116
    corecore