We propose a novel high-dimensional linear regression estimator: the Discrete
Dantzig Selector, which minimizes the number of nonzero regression coefficients
subject to a budget on the maximal absolute correlation between the features
and residuals. Motivated by the significant advances in integer optimization
over the past 10-15 years, we present a Mixed Integer Linear Optimization
(MILO) approach to obtain certifiably optimal global solutions to this
nonconvex optimization problem. The current state of algorithmics in integer
optimization makes our proposal substantially more computationally attractive
than the least squares subset selection framework based on integer quadratic
optimization, recently proposed in [8] and the continuous nonconvex quadratic
optimization framework of [33]. We propose new discrete first-order methods,
which when paired with state-of-the-art MILO solvers, lead to good solutions
for the Discrete Dantzig Selector problem for a given computational budget. We
illustrate that our integrated approach provides globally optimal solutions in
significantly shorter computation times, when compared to off-the-shelf MILO
solvers. We demonstrate both theoretically and empirically that in a wide range
of regimes the statistical properties of the Discrete Dantzig Selector are
superior to those of popular β1β-based approaches. We illustrate that
our approach can handle problem instances with p = 10,000 features with
certifiable optimality making it a highly scalable combinatorial variable
selection approach in sparse linear modeling