37,309 research outputs found

    Efficient Optimization of Loops and Limits with Randomized Telescoping Sums

    Full text link
    We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences

    A Feature Selection Method for Multivariate Performance Measures

    Full text link
    Feature selection with specific multivariate performance measures is the key to the success of many applications, such as image retrieval and text classification. The existing feature selection methods are usually designed for classification error. In this paper, we propose a generalized sparse regularizer. Based on the proposed regularizer, we present a unified feature selection framework for general loss functions. In particular, we study the novel feature selection paradigm by optimizing multivariate performance measures. The resultant formulation is a challenging problem for high-dimensional data. Hence, a two-layer cutting plane algorithm is proposed to solve this problem, and the convergence is presented. In addition, we adapt the proposed method to optimize multivariate measures for multiple instance learning problems. The analyses by comparing with the state-of-the-art feature selection methods show that the proposed method is superior to others. Extensive experiments on large-scale and high-dimensional real world datasets show that the proposed method outperforms l1l_1-SVM and SVM-RFE when choosing a small subset of features, and achieves significantly improved performances over SVMperf^{perf} in terms of F1F_1-score
    • …
    corecore