2 research outputs found
Linear time dynamic programming for the exact path of optimal models selected from a finite set
Many learning algorithms are formulated in terms of finding model parameters
which minimize a data-fitting loss function plus a regularizer. When the
regularizer involves the l0 pseudo-norm, the resulting regularization path
consists of a finite set of models. The fastest existing algorithm for
computing the breakpoints in the regularization path is quadratic in the number
of models, so it scales poorly to high dimensional problems. We provide new
formal proofs that a dynamic programming algorithm can be used to compute the
breakpoints in linear time. Empirical results on changepoint detection problems
demonstrate the improved accuracy and speed relative to grid search and the
previous quadratic time algorithm.Comment: 14 page
Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection
Receiver Operating Characteristic (ROC) curves are plots of true positive
rate versus false positive rate which are useful for evaluating binary
classification models, but difficult to use for learning since the Area Under
the Curve (AUC) is non-convex. ROC curves can also be used in other problems
that have false positive and true positive rates such as changepoint detection.
We show that in this more general context, the ROC curve can have loops, points
with highly sub-optimal error rates, and AUC greater than one. This observation
motivates a new optimization objective: rather than maximizing the AUC, we
would like a monotonic ROC curve with AUC=1 that avoids points with large
values for Min(FP,FN). We propose a convex relaxation of this objective that
results in a new surrogate loss function called the AUM, short for Area Under
Min(FP, FN). Whereas previous loss functions are based on summing over all
labeled examples or pairs, the AUM requires a sort and a sum over the sequence
of points on the ROC curve. We show that AUM directional derivatives can be
efficiently computed and used in a gradient descent learning algorithm. In our
empirical study of supervised binary classification and changepoint detection
problems, we show that our new AUM minimization learning algorithm results in
improved AUC and comparable speed relative to previous baselines