42 research outputs found
An Algorithmic Framework for Computing Validation Performance Bounds by Using Suboptimal Models
Practical model building processes are often time-consuming because many
different models must be trained and validated. In this paper, we introduce a
novel algorithm that can be used for computing the lower and the upper bounds
of model validation errors without actually training the model itself. A key
idea behind our algorithm is using a side information available from a
suboptimal model. If a reasonably good suboptimal model is available, our
algorithm can compute lower and upper bounds of many useful quantities for
making inferences on the unknown target model. We demonstrate the advantage of
our algorithm in the context of model selection for regularized learning
problems
An Exponential Lower Bound on the Complexity of Regularization Paths
For a variety of regularized optimization problems in machine learning,
algorithms computing the entire solution path have been developed recently.
Most of these methods are quadratic programs that are parameterized by a single
parameter, as for example the Support Vector Machine (SVM). Solution path
algorithms do not only compute the solution for one particular value of the
regularization parameter but the entire path of solutions, making the selection
of an optimal parameter much easier.
It has been assumed that these piecewise linear solution paths have only
linear complexity, i.e. linearly many bends. We prove that for the support
vector machine this complexity can be exponential in the number of training
points in the worst case. More strongly, we construct a single instance of n
input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) =
\Theta(2^d) many distinct subsets of support vectors occur as the
regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure
Optimal Two-Step Prediction in Regression
High-dimensional prediction typically comprises two steps: variable selection
and subsequent least-squares refitting on the selected variables. However, the
standard variable selection procedures, such as the lasso, hinge on tuning
parameters that need to be calibrated. Cross-validation, the most popular
calibration scheme, is computationally costly and lacks finite sample
guarantees. In this paper, we introduce an alternative scheme, easy to
implement and both computationally and theoretically efficient