1,906 research outputs found
Robust Classification for Imprecise Environments
In real-world environments it usually is difficult to specify target
operating conditions precisely, for example, target misclassification costs.
This uncertainty makes building robust classification systems problematic. We
show that it is possible to build a hybrid classifier that will perform at
least as well as the best available classifier for any target conditions. In
some cases, the performance of the hybrid actually can surpass that of the best
known classifier. This robust performance extends across a wide variety of
comparison frameworks, including the optimization of metrics such as accuracy,
expected cost, lift, precision, recall, and workforce utilization. The hybrid
also is efficient to build, to store, and to update. The hybrid is based on a
method for the comparison of classifier performance that is robust to imprecise
class distributions and misclassification costs. The ROC convex hull (ROCCH)
method combines techniques from ROC analysis, decision analysis and
computational geometry, and adapts them to the particulars of analyzing learned
classifiers. The method is efficient and incremental, minimizes the management
of classifier performance data, and allows for clear visual comparisons and
sensitivity analyses. Finally, we point to empirical evidence that a robust
hybrid classifier indeed is needed for many real-world problems.Comment: 24 pages, 12 figures. To be published in Machine Learning Journal.
For related papers, see http://www.hpl.hp.com/personal/Tom_Fawcett/ROCCH
An Exponential Lower Bound on the Complexity of Regularization Paths
For a variety of regularized optimization problems in machine learning,
algorithms computing the entire solution path have been developed recently.
Most of these methods are quadratic programs that are parameterized by a single
parameter, as for example the Support Vector Machine (SVM). Solution path
algorithms do not only compute the solution for one particular value of the
regularization parameter but the entire path of solutions, making the selection
of an optimal parameter much easier.
It has been assumed that these piecewise linear solution paths have only
linear complexity, i.e. linearly many bends. We prove that for the support
vector machine this complexity can be exponential in the number of training
points in the worst case. More strongly, we construct a single instance of n
input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) =
\Theta(2^d) many distinct subsets of support vectors occur as the
regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure
- …