Search CORE

124,843 research outputs found

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Author: Arora Raman
Livescu Karen
Srebro Nathan
Wang Weiran
Publication venue
Publication date: 07/10/2015
Field of study

Deep CCA is a recently proposed deep neural network extension to the traditional canonical correlation analysis (CCA), and has been successful for multi-view representation learning in several domains. However, stochastic optimization of the deep CCA objective is not straightforward, because it does not decouple over training examples. Previous optimizers for deep CCA are either batch-based algorithms or stochastic optimization using large minibatches, which can have high memory consumption. In this paper, we tackle the problem of stochastic optimization for deep CCA with small minibatches, based on an iterative solution to the CCA objective, and show that we can achieve as good performance as previous optimizers and thus alleviate the memory requirement.Comment: in 2015 Annual Allerton Conference on Communication, Control and Computin

arXiv.org e-Print Archive

Crossref

Scalable Peaceman-Rachford Splitting Method with Proximal Terms

Author: Na Sen
Ma Mingyuan
Kolar Mladen
Publication venue
Publication date: 09/02/2018
Field of study

Along with developing of Peaceman-Rachford Splittling Method (PRSM), many batch algorithms based on it have been studied very deeply. But almost no algorithm focused on the performance of stochastic version of PRSM. In this paper, we propose a new stochastic algorithm based on PRSM, prove its convergence rate in ergodic sense, and test its performance on both artificial and real data. We show that our proposed algorithm, Stochastic Scalable PRSM (SS-PRSM), enjoys the

O(1/K)

convergence rate, which is the same as those newest stochastic algorithms that based on ADMM but faster than general Stochastic ADMM (which is

O(1/\sqrt{K})

). Our algorithm also owns wide flexibility, outperforms many state-of-the-art stochastic algorithms coming from ADMM, and has low memory cost in large-scale splitting optimization problems

arXiv.org e-Print Archive

FigShare

Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes

Author: Crutchfield James P.
Loomis Samuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/08/2018
Field of study

Among the predictive hidden Markov models that describe a given stochastic process, the {\epsilon}-machine is strongly minimal in that it minimizes every R\'enyi-based memory measure. Quantum models can be smaller still. In contrast with the {\epsilon}-machine's unique role in the classical setting, however, among the class of processes described by pure-state hidden quantum Markov models, there are those for which there does not exist any strongly minimal model. Quantum memory optimization then depends on which memory measure best matches a given problem circumstance.Comment: 14 pages, 14 figures; http://csc.ucdavis.edu/~cmg/compmech/pubs/uemum.ht

arXiv.org e-Print Archive

eScholarship - University of California

Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees

Author: Alistarh Dan
Chatterjee Bapi
Egan Malcolm
Kungurtsev Vyacheslav
Publication venue
Publication date: 11/07/2020
Field of study

Asynchronous distributed algorithms are a popular way to reduce synchronization costs in large-scale optimization, and in particular for neural network training. However, for nonsmooth and nonconvex objectives, few convergence guarantees exist beyond cases where closed-form proximal operator solutions are available. As most popular contemporary deep neural networks lead to nonsmooth and nonconvex objectives, there is now a pressing need for such convergence guarantees. In this paper, we analyze for the first time the convergence of stochastic asynchronous optimization for this general class of objectives. In particular, we focus on stochastic subgradient methods allowing for block variable partitioning, where the shared-memory-based model is asynchronously updated by concurrent processes. To this end, we first introduce a probabilistic model which captures key features of real asynchronous scheduling between concurrent processes; under this model, we establish convergence with probability one to an invariant set for stochastic subgradient methods with momentum. From the practical perspective, one issue with the family of methods we consider is that it is not efficiently supported by machine learning frameworks, as they mostly focus on distributed data-parallel strategies. To address this, we propose a new implementation strategy for shared-memory based training of deep neural networks, whereby concurrent parameter servers are utilized to train a partitioned but shared model in single- and multi-GPU settings. Based on this implementation, we achieve on average 1.2x speed-up in comparison to state-of-the-art training methods for popular image classification tasks without compromising accuracy

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

IST Austria: PubRep (Institute of Science and Technology)

Hal-Diderot

Association for the Advancement of Artificial Intelligence: AAAI Publications

Risk-Averse Planning Under Uncertainty

Author: Ahmadi Mohamadreza
Ames Aaron D.
Ingham Michel D.
Murray Richard M.
Ono Masahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2020
Field of study

We consider the problem of designing policies for partially observable Markov decision processes (POMDPs) with dynamic coherent risk objectives. Synthesizing risk-averse optimal policies for POMDPs requires infinite memory and thus undecidable. To overcome this difficulty, we propose a method based on bounded policy iteration for designing stochastic but finite state (memory) controllers, which takes advantage of standard convex optimization methods. Given a memory budget and optimality criterion, the proposed method modifies the stochastic finite state controller leading to sub-optimal solutions with lower coherent risk