Sparsifying generalized linear models

Abstract

We consider the sparsification of sums F:Rnβ†’RF : \mathbb{R}^n \to \mathbb{R} where F(x)=f1(⟨a1,x⟩)+β‹―+fm(⟨am,x⟩)F(x) = f_1(\langle a_1,x\rangle) + \cdots + f_m(\langle a_m,x\rangle) for vectors a1,…,am∈Rna_1,\ldots,a_m \in \mathbb{R}^n and functions f1,…,fm:Rβ†’R+f_1,\ldots,f_m : \mathbb{R} \to \mathbb{R}_+. We show that (1+Ξ΅)(1+\varepsilon)-approximate sparsifiers of FF with support size nΞ΅2(log⁑nΞ΅)O(1)\frac{n}{\varepsilon^2} (\log \frac{n}{\varepsilon})^{O(1)} exist whenever the functions f1,…,fmf_1,\ldots,f_m are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each fif_i can be evaluated efficiently. Our results generalize the classic case of β„“p\ell_p sparsification, where fi(z)=∣z∣pf_i(z) = |z|^p, for p∈(0,2]p \in (0, 2], and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., fi(z)=min⁑{∣z∣p,∣z∣2}f_i(z) = \min\{|z|^p, |z|^2\} for 0<p≀20 < p \leq 2. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including β„“p\ell_p regression for p∈(1,2]p \in (1, 2] to high accuracy, via solving (log⁑n)O(1)(\log n)^{O(1)} sparse regression instances with m≀n(log⁑n)O(1)m \le n(\log n)^{O(1)}, plus runtime proportional to the number of nonzero entries in the vectors a1,…,ama_1, \dots, a_m

    Similar works

    Full text

    thumbnail-image

    Available Versions