269 research outputs found
Convergence of the Exponentiated Gradient Method with Armijo Line Search
Consider the problem of minimizing a convex differentiable function on the
probability simplex, spectrahedron, or set of quantum density matrices. We
prove that the exponentiated gradient method with Armjo line search always
converges to the optimum, if the sequence of the iterates possesses a strictly
positive limit point (element-wise for the vector case, and with respect to the
Lowner partial ordering for the matrix case). To the best our knowledge, this
is the first convergence result for a mirror descent-type method that only
requires differentiability. The proof exploits self-concordant likeness of the
log-partition function, which is of independent interest.Comment: 18 page
A Geometric View on Constrained M-Estimators
We study the estimation error of constrained M-estimators, and derive
explicit upper bounds on the expected estimation error determined by the
Gaussian width of the constraint set. Both of the cases where the true
parameter is on the boundary of the constraint set (matched constraint), and
where the true parameter is strictly in the constraint set (mismatched
constraint) are considered. For both cases, we derive novel universal
estimation error bounds for regression in a generalized linear model with the
canonical link function. Our error bound for the mismatched constraint case is
minimax optimal in terms of its dependence on the sample size, for Gaussian
linear regression by the Lasso
An Interpretation of the Moore-Penrose Generalized Inverse of a Singular Fisher Information Matrix
It is proved that in a non-Bayesian parametric estimation problem, if the
Fisher information matrix (FIM) is singular, unbiased estimators for the
unknown parameter will not exist. Cramer-Rao bound (CRB), a popular tool to
lower bound the variances of unbiased estimators, seems inapplicable in such
situations. In this paper, we show that the Moore-Penrose generalized inverse
of a singular FIM can be interpreted as the CRB corresponding to the minimum
variance among all choices of minimum constraint functions. This result ensures
the logical validity of applying the Moore-Penrose generalized inverse of an
FIM as the covariance lower bound when the FIM is singular. Furthermore, the
result can be applied as a performance bound on the joint design of constraint
functions and unbiased estimators.Comment: 10 pages, accepted for publication in IEEE Transactions on Signal
Processin
Two Polyak-Type Step Sizes for Mirror Descent
We propose two Polyak-type step sizes for mirror descent and prove their
convergences for minimizing convex locally Lipschitz functions. Both step
sizes, unlike the original Polyak step size, do not need the optimal value of
the objective function.Comment: 13 page
Learning without Smoothness and Strong Convexity
Recent advances in statistical learning and convex optimization have inspired many successful practices. Standard theories assume smoothness---bounded gradient, Hessian, etc.---and strong convexity of the loss function. Unfortunately, such conditions may not hold in important real-world applications, and sometimes, to fulfill the conditions incurs unnecessary performance degradation. Below are three examples.
1. The standard theory for variable selection via L_1-penalization only considers the linear regression model, as the corresponding quadratic loss function has a constant Hessian and allows for exact second-order Taylor series expansion. In practice, however, non-linear regression models are often chosen to match data characteristics.
2. The standard theory for convex optimization considers almost exclusively smooth functions. Important applications such as portfolio selection and quantum state estimation, however, correspond to loss functions that violate the smoothness assumption; existing convergence guarantees for optimization algorithms hence do not apply.
3. The standard theory for compressive magnetic resonance imaging (MRI) guarantees the restricted isometry property (RIP)---a smoothness and strong convexity condition on the quadratic loss restricted on the set of sparse vectors---via random uniform sampling. The random uniform sampling strategy, however, yields unsatisfactory signal reconstruction performance empirically, in comparison to heuristic sampling approaches.
In this thesis, we provide rigorous solutions to the three examples above and other related problems. For the first two problems above, our key idea is to instead consider weaker localized versions of the smoothness condition. For the third, our solution is to propose a new theoretical framework for compressive MRI: We pose compressive MRI as a statistical learning problem, and solve it by empirical risk minimization. Interestingly, the RIP is not required in this framework
Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States
Consider an online convex optimization problem where the loss functions are
self-concordant barriers, smooth relative to a convex function , and
possibly non-Lipschitz. We analyze the regret of online mirror descent with
. Then, based on the result, we prove the following in a unified manner.
Denote by the time horizon and the parameter dimension. 1. For online
portfolio selection, the regret of , a variant of
exponentiated gradient due to Helmbold et al., is when . This improves on the original regret bound for . 2. For online portfolio
selection, the regret of online mirror descent with the logarithmic barrier is
. The regret bound is the same as that of Soft-Bayes due
to Orseau et al. up to logarithmic terms. 3. For online learning quantum states
with the logarithmic loss, the regret of online mirror descent with the
log-determinant function is also . Its per-iteration
time is shorter than all existing algorithms we know.Comment: 19 pages, 1 figur
Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging
Consider the problem of minimizing an expected logarithmic loss over either
the probability simplex or the set of quantum density matrices. This problem
includes tasks such as solving the Poisson inverse problem, computing the
maximum-likelihood estimate for quantum state tomography, and approximating
positive semi-definite matrix permanents with the currently tightest
approximation ratio. Although the optimization problem is convex, standard
iteration complexity guarantees for first-order methods do not directly apply
due to the absence of Lipschitz continuity and smoothness in the loss function.
In this work, we propose a stochastic first-order algorithm named -sample
stochastic dual averaging with the logarithmic barrier. For the Poisson inverse
problem, our algorithm attains an -optimal solution in
time, matching the state of the art,
where denotes the dimension. When computing the maximum-likelihood estimate
for quantum state tomography, our algorithm yields an -optimal
solution in time. This improves on the
time complexities of existing stochastic first-order methods by a factor of
and those of batch methods by a factor of , where
denotes the matrix multiplication exponent. Numerical experiments demonstrate
that empirically, our algorithm outperforms existing methods with explicit
complexity guarantees.Comment: 26 pages, AISTATS 202
- …