342 research outputs found
Least quantile regression via modern optimization
We address the Least Quantile of Squares (LQS) (and in particular the Least
Median of Squares) regression problem using modern optimization methods. We
propose a Mixed Integer Optimization (MIO) formulation of the LQS problem which
allows us to find a provably global optimal solution for the LQS problem. Our
MIO framework has the appealing characteristic that if we terminate the
algorithm early, we obtain a solution with a guarantee on its sub-optimality.
We also propose continuous optimization methods based on first-order
subdifferential methods, sequential linear optimization and hybrid combinations
of them to obtain near optimal solutions to the LQS problem. The MIO algorithm
is found to benefit significantly from high quality solutions delivered by our
continuous optimization based methods. We further show that the MIO approach
leads to (a) an optimal solution for any dataset, where the data-points
's are not necessarily in general position, (b) a simple
proof of the breakdown point of the LQS objective value that holds for any
dataset and (c) an extension to situations where there are polyhedral
constraints on the regression coefficient vector. We report computational
results with both synthetic and real-world datasets showing that the MIO
algorithm with warm starts from the continuous optimization methods solve small
() and medium () size problems to provable optimality in under
two hours, and outperform all publicly available methods for large-scale
(10,000) LQS problems.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1223 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Detecting K-complexes for sleep stage identification using nonsmooth optimisation
The process of sleep stage identification is a labour-intensive task that involves the specialized interpretation of the polysomnographic signals captured from a patient’s overnight sleep session. Automating this task has proven to be challenging for data mining algorithms because of noise, complexity and the extreme size of data. In this paper we apply nonsmooth optimization to extract key features that lead to better accuracy. We develop a specific procedure for identifying K-complexes, a special type of brain wave crucial for distinguishing sleep stages. The procedure contains two steps. We first extract “easily classified” K-complexes, and then apply nonsmooth optimization methods to extract features from the remaining data and refine the results from the first step. Numerical experiments show that this procedure is efficient for detecting K-complexes. It is also found that most classification methods perform significantly better on the extracted features
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
A Fast Smoothing Newton Method for Bilevel Hyperparameter Optimization for SVC with Logistic Loss
Support Vector Classification with logistic loss has excellent theoretical
properties in classification problems where the label values are not
continuous. In this paper, we reformulate the hyperparameter selection for SVC
with logistic loss as a bilevel optimization problem in which the upper-level
problem and the lower-level problem are both based on logistic loss. The
resulting bilevel optimization model is converted to a single-level nonlinear
programming (NLP) problem based on the KKT conditions of the lower-level
problem. Such NLP contains a set of nonlinear equality constraints and a simple
lower bound constraint. The second-order sufficient condition is characterized,
which guarantees that the strict local optimizers are obtained. To solve such
NLP, we apply the smoothing Newton method proposed in \cite{Liang} to solve the
KKT conditions, which contain one pair of complementarity constraints. We show
that the smoothing Newton method has a superlinear convergence rate. Extensive
numerical results verify the efficiency of the proposed approach and strict
local minimizers can be achieved both numerically and theoretically. In
particular, compared with other methods, our algorithm can achieve competitive
results while consuming less time than other methods.Comment: 27 page
- …