124,843 research outputs found
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Deep CCA is a recently proposed deep neural network extension to the
traditional canonical correlation analysis (CCA), and has been successful for
multi-view representation learning in several domains. However, stochastic
optimization of the deep CCA objective is not straightforward, because it does
not decouple over training examples. Previous optimizers for deep CCA are
either batch-based algorithms or stochastic optimization using large
minibatches, which can have high memory consumption. In this paper, we tackle
the problem of stochastic optimization for deep CCA with small minibatches,
based on an iterative solution to the CCA objective, and show that we can
achieve as good performance as previous optimizers and thus alleviate the
memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and
Computin
Scalable Peaceman-Rachford Splitting Method with Proximal Terms
Along with developing of Peaceman-Rachford Splittling Method (PRSM), many
batch algorithms based on it have been studied very deeply. But almost no
algorithm focused on the performance of stochastic version of PRSM. In this
paper, we propose a new stochastic algorithm based on PRSM, prove its
convergence rate in ergodic sense, and test its performance on both artificial
and real data. We show that our proposed algorithm, Stochastic Scalable PRSM
(SS-PRSM), enjoys the convergence rate, which is the same as those
newest stochastic algorithms that based on ADMM but faster than general
Stochastic ADMM (which is ). Our algorithm also owns wide
flexibility, outperforms many state-of-the-art stochastic algorithms coming
from ADMM, and has low memory cost in large-scale splitting optimization
problems
Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes
Among the predictive hidden Markov models that describe a given stochastic
process, the {\epsilon}-machine is strongly minimal in that it minimizes every
R\'enyi-based memory measure. Quantum models can be smaller still. In contrast
with the {\epsilon}-machine's unique role in the classical setting, however,
among the class of processes described by pure-state hidden quantum Markov
models, there are those for which there does not exist any strongly minimal
model. Quantum memory optimization then depends on which memory measure best
matches a given problem circumstance.Comment: 14 pages, 14 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/uemum.ht
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy
Risk-Averse Planning Under Uncertainty
We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To overcome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which takes advantage of standard convex optimization methods. Given a memory budget and optimality criterion, the proposed method modifies the stochastic finite state controller leading to sub-optimal solutions with lower coherent risk
- …