8 research outputs found
Linear Regression using Heterogeneous Data Batches
In many learning applications, data are collected from multiple sources, each
providing a \emph{batch} of samples that by itself is insufficient to learn its
input-output relationship. A common approach assumes that the sources fall in
one of several unknown subgroups, each with an unknown input distribution and
input-output relationship. We consider one of this setup's most fundamental and
important manifestations where the output is a noisy linear combination of the
inputs, and there are subgroups, each with its own regression vector. Prior
work~\cite{kong2020meta} showed that with abundant small-batches, the
regression vectors can be learned with only few, ,
batches of medium-size with samples each. However, the
paper requires that the input distribution for all subgroups be isotropic
Gaussian, and states that removing this assumption is an ``interesting and
challenging problem". We propose a novel gradient-based algorithm that improves
on the existing results in several ways. It extends the applicability of the
algorithm by: (1) allowing the subgroups' underlying input distributions to be
different, unknown, and heavy-tailed; (2) recovering all subgroups followed by
a significant proportion of batches even for infinite ; (3) removing the
separation requirement between the regression vectors; (4) reducing the number
of batches and allowing smaller batch sizes
Outlier-robust sparse/low-rank least-squares regression and robust matrix completion
We consider high-dimensional least-squares regression when a fraction
of the labels are contaminated by an arbitrary adversary. We analyze
such problem in the statistical learning framework with a subgaussian
distribution and linear hypothesis class on the space of
matrices. As such, we allow the noise to be heterogeneous. This framework
includes sparse linear regression and low-rank trace-regression. For a
-dimensional -sparse parameter, we show that a convex regularized
-estimator using a sorted Huber-type loss achieves the near-optimal
subgaussian rate with
probability at least . For a -dimensional parameter
with rank , a nuclear-norm regularized -estimator using the same sorted
Huber-type loss achieves the subgaussian rate again optimal up to a log factor. In a second part, we study the
trace-regression problem when the parameter is the sum of a matrix with rank
plus a -sparse matrix assuming the "low-spikeness" condition. Unlike
multivariate regression studied in previous work, the design in
trace-regression lacks positive-definiteness in high-dimensions. Still, we show
that a regularized least-squares estimator achieves the subgaussian rate
Lastly, we consider noisy matrix completion with non-uniform sampling when a
fraction of the sampled low-rank matrix is corrupted by outliers. If
only the low-rank matrix is of interest, we show that a nuclear-norm
regularized Huber-type estimator achieves, up to log factors, the optimal rate
adaptively to the corruption level. The above mentioned rates require no
information on
On Tilted Losses in Machine Learning: Theory and Applications
Exponential tilting is a technique commonly used in fields such as
statistics, probability, information theory, and optimization to create
parametric distribution shifts. Despite its prevalence in related fields,
tilting has not seen widespread use in machine learning. In this work, we aim
to bridge this gap by exploring the use of tilting in risk minimization. We
study a simple extension to ERM -- tilted empirical risk minimization (TERM) --
which uses exponential tilting to flexibly tune the impact of individual
losses. The resulting framework has several useful properties: We show that
TERM can increase or decrease the influence of outliers, respectively, to
enable fairness or robustness; has variance-reduction properties that can
benefit generalization; and can be viewed as a smooth approximation to the tail
probability of losses. Our work makes rigorous connections between TERM and
related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and
distributionally robust optimization (DRO). We develop batch and stochastic
first-order optimization methods for solving TERM, provide convergence
guarantees for the solvers, and show that the framework can be efficiently
solved relative to common alternatives. Finally, we demonstrate that TERM can
be used for a multitude of applications in machine learning, such as enforcing
fairness between subgroups, mitigating the effect of outliers, and handling
class imbalance. Despite the straightforward modification TERM makes to
traditional ERM objectives, we find that the framework can consistently
outperform ERM and deliver competitive performance with state-of-the-art,
problem-specific approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0116