190 research outputs found

    Fast Cross-Validation via Sequential Testing

    Full text link
    With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which uses nonparametric testing coupled with sequential analysis to determine the best parameter set on linearly increasing subsets of the data. By eliminating underperforming candidates quickly and keeping promising candidates as long as possible, the method speeds up the computation while preserving the capability of the full cross-validation. Theoretical considerations underline the statistical power of our procedure. The experimental evaluation shows that our method reduces the computation time by a factor of up to 120 compared to a full cross-validation with a negligible impact on the accuracy

    Fast cross-validation of kernel fisher discriminant classifiers

    Full text link
    Given n training examples, the training of a Kernel Fisher Discriminant (KFD) classifier corresponds to solving a linear system of dimension n. In cross-validating KFD, the training examples are split into 2 distinct subsets for a number of times (L) wherein a subset of m examples is used for validation and the other subset of(n - m) examples is used for training the classifier. In this case L linear systems of dimension (n - m) need to be solved. We propose a novel method for cross-validation of KFD in which instead of solving L linear systems of dimension (n - m), we compute the inverse of an n &times; n matrix and solve L linear systems of dimension 2m, thereby reducing the complexity when L is large and/or m is small. For typical 10-fold and leave-one-out cross-validations, the proposed algorithm is approximately 4 and (4/9n) times respectively as efficient as the naive implementations. Simulations are provided to demonstrate the efficiency of the proposed algorithms.<br /

    Fast cross-validation for multi-penalty ridge regression

    Full text link
    High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners
    corecore