7,081 research outputs found
Stochastic Spectral Descent for Discrete Graphical Models
Interest in deep probabilistic graphical models has increased in recent years, due to their state-of-the-art perfor- mance on many machine learning applications. Such models are typically trained with the stochastic gradient method, which can take a significant number of iterations to converge. Since the computational cost of gradient estimation is prohibitive even for modestly-sized models, training becomes slow and practically- usable models are kept small. In this paper we propose a new, largely tuning-free algorithm to address this problem. Our approach derives novel majorization bounds based on the Schatten-∞ norm. Intriguingly, the minimizers of these bounds can be interpreted as gradient methods in a non-Euclidean space. We thus propose using a stochastic gradient method in non-Euclidean space. We both provide simple conditions under which our algorithm is guaranteed to converge, and demonstrate empirically that our algorithm leads to dramatically faster training and improved predictive ability compared to stochastic gradient descent for both directed and undirected graphical models
Maximum Likelihood Learning With Arbitrary Treewidth via Fast-Mixing Parameter Sets
Inference is typically intractable in high-treewidth undirected graphical
models, making maximum likelihood learning a challenge. One way to overcome
this is to restrict parameters to a tractable set, most typically the set of
tree-structured parameters. This paper explores an alternative notion of a
tractable set, namely a set of "fast-mixing parameters" where Markov chain
Monte Carlo (MCMC) inference can be guaranteed to quickly converge to the
stationary distribution. While it is common in practice to approximate the
likelihood gradient using samples obtained from MCMC, such procedures lack
theoretical guarantees. This paper proves that for any exponential family with
bounded sufficient statistics, (not just graphical models) when parameters are
constrained to a fast-mixing set, gradient descent with gradients approximated
by sampling will approximate the maximum likelihood solution inside the set
with high-probability. When unregularized, to find a solution epsilon-accurate
in log-likelihood requires a total amount of effort cubic in 1/epsilon,
disregarding logarithmic factors. When ridge-regularized, strong convexity
allows a solution epsilon-accurate in parameter distance with effort quadratic
in 1/epsilon. Both of these provide of a fully-polynomial time randomized
approximation scheme.Comment: Advances in Neural Information Processing Systems 201
Connections Between Adaptive Control and Optimization in Machine Learning
This paper demonstrates many immediate connections between adaptive control
and optimization methods commonly employed in machine learning. Starting from
common output error formulations, similarities in update law modifications are
examined. Concepts in stability, performance, and learning, common to both
fields are then discussed. Building on the similarities in update laws and
common concepts, new intersections and opportunities for improved algorithm
analysis are provided. In particular, a specific problem related to higher
order learning is solved through insights obtained from these intersections.Comment: 18 page
- …