1,883 research outputs found
Least quantile regression via modern optimization
We address the Least Quantile of Squares (LQS) (and in particular the Least
Median of Squares) regression problem using modern optimization methods. We
propose a Mixed Integer Optimization (MIO) formulation of the LQS problem which
allows us to find a provably global optimal solution for the LQS problem. Our
MIO framework has the appealing characteristic that if we terminate the
algorithm early, we obtain a solution with a guarantee on its sub-optimality.
We also propose continuous optimization methods based on first-order
subdifferential methods, sequential linear optimization and hybrid combinations
of them to obtain near optimal solutions to the LQS problem. The MIO algorithm
is found to benefit significantly from high quality solutions delivered by our
continuous optimization based methods. We further show that the MIO approach
leads to (a) an optimal solution for any dataset, where the data-points
's are not necessarily in general position, (b) a simple
proof of the breakdown point of the LQS objective value that holds for any
dataset and (c) an extension to situations where there are polyhedral
constraints on the regression coefficient vector. We report computational
results with both synthetic and real-world datasets showing that the MIO
algorithm with warm starts from the continuous optimization methods solve small
() and medium () size problems to provable optimality in under
two hours, and outperform all publicly available methods for large-scale
(10,000) LQS problems.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1223 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Sharp Lower Bound of Asymptotic Efficiency of Estimators in the Zone of Moderate Deviation Probabilities
For the zone of moderate deviation probabilities the local asymptotic minimax
lower bound of asymptotic efficiency of estimators is established. The
estimation parameter is multidimensional. The lower bound admits the
interpretation as the lower bound of asymptotic efficiency in confidence
estimation
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Generalized linear models (GLMs) -- such as logistic regression, Poisson
regression, and robust regression -- provide interpretable models for diverse
data types. Probabilistic approaches, particularly Bayesian ones, allow
coherent estimates of uncertainty, incorporation of prior information, and
sharing of power across experiments via hierarchical models. In practice,
however, the approximate Bayesian methods necessary for inference have either
failed to scale to large data sets or failed to provide theoretical guarantees
on the quality of inference. We propose a new approach based on constructing
polynomial approximate sufficient statistics for GLMs (PASS-GLM). We
demonstrate that our method admits a simple algorithm as well as trivial
streaming and distributed extensions that do not compound error across
computations. We provide theoretical guarantees on the quality of point (MAP)
estimates, the approximate posterior, and posterior mean and uncertainty
estimates. We validate our approach empirically in the case of logistic
regression using a quadratic approximation and show competitive performance
with stochastic gradient descent, MCMC, and the Laplace approximation in terms
of speed and multiple measures of accuracy -- including on an advertising data
set with 40 million data points and 20,000 covariates.Comment: In Proceedings of the 31st Annual Conference on Neural Information
Processing Systems (NIPS 2017). v3: corrected typos in Appendix
- …