Search CORE

8 research outputs found

Linear Regression using Heterogeneous Data Batches

Author: Das Abhimanyu
Jain Ayush
Kong Weihao
Orlitsky Alon
Sen Rajat
Publication venue
Publication date: 05/09/2023
Field of study

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are

k

subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few,

\tilde\Omega( k^{3/2})

, batches of medium-size with

\tilde\Omega(\sqrt k)

samples each. However, the paper requires that the input distribution for all

k

subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite

k

; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes

arXiv.org e-Print Archive

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

Author: Thompson Philip
Publication venue
Publication date: 12/12/2020
Field of study

We consider high-dimensional least-squares regression when a fraction

\epsilon

of the labels are contaminated by an arbitrary adversary. We analyze such problem in the statistical learning framework with a subgaussian distribution and linear hypothesis class on the space of

d_1\times d_2

matrices. As such, we allow the noise to be heterogeneous. This framework includes sparse linear regression and low-rank trace-regression. For a

p

-dimensional

s

-sparse parameter, we show that a convex regularized

M

-estimator using a sorted Huber-type loss achieves the near-optimal subgaussian rate

\sqrt{s\log(ep/s)}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon),

with probability at least

1-\delta

. For a

(d_1\times d_2)

-dimensional parameter with rank

r

, a nuclear-norm regularized

M

-estimator using the same sorted Huber-type loss achieves the subgaussian rate

\sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{\log(1/\delta)/n}+\epsilon\log(1/\epsilon),

again optimal up to a log factor. In a second part, we study the trace-regression problem when the parameter is the sum of a matrix with rank

r

plus a

s

-sparse matrix assuming the "low-spikeness" condition. Unlike multivariate regression studied in previous work, the design in trace-regression lacks positive-definiteness in high-dimensions. Still, we show that a regularized least-squares estimator achieves the subgaussian rate

\sqrt{rd_1/n}+\sqrt{rd_2/n}+\sqrt{s\log(d_1d_2)/n} +\sqrt{\log(1/\delta)/n}.

Lastly, we consider noisy matrix completion with non-uniform sampling when a fraction

\epsilon

of the sampled low-rank matrix is corrupted by outliers. If only the low-rank matrix is of interest, we show that a nuclear-norm regularized Huber-type estimator achieves, up to log factors, the optimal rate adaptively to the corruption level. The above mentioned rates require no information on

(s,r,\epsilon)

arXiv.org e-Print Archive

On Tilted Losses in Machine Learning: Theory and Applications

Author: Beirami Ahmad
Li Tian
Sanjabi Maziar
Smith Virginia
Publication venue
Publication date: 01/06/2023
Field of study

Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. Despite its prevalence in related fields, tilting has not seen widespread use in machine learning. In this work, we aim to bridge this gap by exploring the use of tilting in risk minimization. We study a simple extension to ERM -- tilted empirical risk minimization (TERM) -- which uses exponential tilting to flexibly tune the impact of individual losses. The resulting framework has several useful properties: We show that TERM can increase or decrease the influence of outliers, respectively, to enable fairness or robustness; has variance-reduction properties that can benefit generalization; and can be viewed as a smooth approximation to the tail probability of losses. Our work makes rigorous connections between TERM and related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and distributionally robust optimization (DRO). We develop batch and stochastic first-order optimization methods for solving TERM, provide convergence guarantees for the solvers, and show that the framework can be efficiently solved relative to common alternatives. Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0116

arXiv.org e-Print Archive