269 research outputs found

    Convergence of the Exponentiated Gradient Method with Armijo Line Search

    Get PDF
    Consider the problem of minimizing a convex differentiable function on the probability simplex, spectrahedron, or set of quantum density matrices. We prove that the exponentiated gradient method with Armjo line search always converges to the optimum, if the sequence of the iterates possesses a strictly positive limit point (element-wise for the vector case, and with respect to the Lowner partial ordering for the matrix case). To the best our knowledge, this is the first convergence result for a mirror descent-type method that only requires differentiability. The proof exploits self-concordant likeness of the log-partition function, which is of independent interest.Comment: 18 page

    A Geometric View on Constrained M-Estimators

    Get PDF
    We study the estimation error of constrained M-estimators, and derive explicit upper bounds on the expected estimation error determined by the Gaussian width of the constraint set. Both of the cases where the true parameter is on the boundary of the constraint set (matched constraint), and where the true parameter is strictly in the constraint set (mismatched constraint) are considered. For both cases, we derive novel universal estimation error bounds for regression in a generalized linear model with the canonical link function. Our error bound for the mismatched constraint case is minimax optimal in terms of its dependence on the sample size, for Gaussian linear regression by the Lasso

    An Interpretation of the Moore-Penrose Generalized Inverse of a Singular Fisher Information Matrix

    Full text link
    It is proved that in a non-Bayesian parametric estimation problem, if the Fisher information matrix (FIM) is singular, unbiased estimators for the unknown parameter will not exist. Cramer-Rao bound (CRB), a popular tool to lower bound the variances of unbiased estimators, seems inapplicable in such situations. In this paper, we show that the Moore-Penrose generalized inverse of a singular FIM can be interpreted as the CRB corresponding to the minimum variance among all choices of minimum constraint functions. This result ensures the logical validity of applying the Moore-Penrose generalized inverse of an FIM as the covariance lower bound when the FIM is singular. Furthermore, the result can be applied as a performance bound on the joint design of constraint functions and unbiased estimators.Comment: 10 pages, accepted for publication in IEEE Transactions on Signal Processin

    Two Polyak-Type Step Sizes for Mirror Descent

    Full text link
    We propose two Polyak-type step sizes for mirror descent and prove their convergences for minimizing convex locally Lipschitz functions. Both step sizes, unlike the original Polyak step size, do not need the optimal value of the objective function.Comment: 13 page

    Learning without Smoothness and Strong Convexity

    Get PDF
    Recent advances in statistical learning and convex optimization have inspired many successful practices. Standard theories assume smoothness---bounded gradient, Hessian, etc.---and strong convexity of the loss function. Unfortunately, such conditions may not hold in important real-world applications, and sometimes, to fulfill the conditions incurs unnecessary performance degradation. Below are three examples. 1. The standard theory for variable selection via L_1-penalization only considers the linear regression model, as the corresponding quadratic loss function has a constant Hessian and allows for exact second-order Taylor series expansion. In practice, however, non-linear regression models are often chosen to match data characteristics. 2. The standard theory for convex optimization considers almost exclusively smooth functions. Important applications such as portfolio selection and quantum state estimation, however, correspond to loss functions that violate the smoothness assumption; existing convergence guarantees for optimization algorithms hence do not apply. 3. The standard theory for compressive magnetic resonance imaging (MRI) guarantees the restricted isometry property (RIP)---a smoothness and strong convexity condition on the quadratic loss restricted on the set of sparse vectors---via random uniform sampling. The random uniform sampling strategy, however, yields unsatisfactory signal reconstruction performance empirically, in comparison to heuristic sampling approaches. In this thesis, we provide rigorous solutions to the three examples above and other related problems. For the first two problems above, our key idea is to instead consider weaker localized versions of the smoothness condition. For the third, our solution is to propose a new theoretical framework for compressive MRI: We pose compressive MRI as a statistical learning problem, and solve it by empirical risk minimization. Interestingly, the RIP is not required in this framework

    Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States

    Full text link
    Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function hh, and possibly non-Lipschitz. We analyze the regret of online mirror descent with hh. Then, based on the result, we prove the following in a unified manner. Denote by TT the time horizon and dd the parameter dimension. 1. For online portfolio selection, the regret of EG~\widetilde{\text{EG}}, a variant of exponentiated gradient due to Helmbold et al., is O~(T2/3d1/3)\tilde{O} ( T^{2/3} d^{1/3} ) when T>4d/logdT > 4 d / \log d. This improves on the original O~(T3/4d1/2)\tilde{O} ( T^{3/4} d^{1/2} ) regret bound for EG~\widetilde{\text{EG}}. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is O~(Td)\tilde{O}(\sqrt{T d}). The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also O~(Td)\tilde{O} ( \sqrt{T d} ). Its per-iteration time is shorter than all existing algorithms we know.Comment: 19 pages, 1 figur

    Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

    Full text link
    Consider the problem of minimizing an expected logarithmic loss over either the probability simplex or the set of quantum density matrices. This problem includes tasks such as solving the Poisson inverse problem, computing the maximum-likelihood estimate for quantum state tomography, and approximating positive semi-definite matrix permanents with the currently tightest approximation ratio. Although the optimization problem is convex, standard iteration complexity guarantees for first-order methods do not directly apply due to the absence of Lipschitz continuity and smoothness in the loss function. In this work, we propose a stochastic first-order algorithm named BB-sample stochastic dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an ε\varepsilon-optimal solution in O~(d2/ε2)\smash{\tilde{O}}(d^2/\varepsilon^2) time, matching the state of the art, where dd denotes the dimension. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an ε\varepsilon-optimal solution in O~(d3/ε2)\smash{\tilde{O}}(d^3/\varepsilon^2) time. This improves on the time complexities of existing stochastic first-order methods by a factor of dω2d^{\omega-2} and those of batch methods by a factor of d2d^2, where ω\omega denotes the matrix multiplication exponent. Numerical experiments demonstrate that empirically, our algorithm outperforms existing methods with explicit complexity guarantees.Comment: 26 pages, AISTATS 202
    corecore