62 research outputs found
Convolutional Dictionary Learning: Acceleration and Convergence
Convolutional dictionary learning (CDL or sparsifying CDL) has many
applications in image processing and computer vision. There has been growing
interest in developing efficient algorithms for CDL, mostly relying on the
augmented Lagrangian (AL) method or the variant alternating direction method of
multipliers (ADMM). When their parameters are properly tuned, AL methods have
shown fast convergence in CDL. However, the parameter tuning process is not
trivial due to its data dependence and, in practice, the convergence of AL
methods depends on the AL parameters for nonconvex CDL problems. To moderate
these problems, this paper proposes a new practically feasible and convergent
Block Proximal Gradient method using a Majorizer (BPG-M) for CDL. The
BPG-M-based CDL is investigated with different block updating schemes and
majorization matrix designs, and further accelerated by incorporating some
momentum coefficient formulas and restarting techniques. All of the methods
investigated incorporate a boundary artifacts removal (or, more generally,
sampling) operator in the learning model. Numerical experiments show that,
without needing any parameter tuning process, the proposed BPG-M approach
converges more stably to desirable solutions of lower objective values than the
existing state-of-the-art ADMM algorithm and its memory-efficient variant do.
Compared to the ADMM approaches, the BPG-M method using a multi-block updating
scheme is particularly useful in single-threaded CDL algorithm handling large
datasets, due to its lower memory requirement and no polynomial computational
complexity. Image denoising experiments show that, for relatively strong
additive white Gaussian noise, the filters learned by BPG-M-based CDL
outperform those trained by the ADMM approach.Comment: 21 pages, 7 figures, submitted to IEEE Transactions on Image
Processin
Coherence retrieval using trace regularization
The mutual intensity and its equivalent phase-space representations quantify
an optical field's state of coherence and are important tools in the study of
light propagation and dynamics, but they can only be estimated indirectly from
measurements through a process called coherence retrieval, otherwise known as
phase-space tomography. As practical considerations often rule out the
availability of a complete set of measurements, coherence retrieval is usually
a challenging high-dimensional ill-posed inverse problem. In this paper, we
propose a trace-regularized optimization model for coherence retrieval and a
provably-convergent adaptive accelerated proximal gradient algorithm for
solving the resulting problem. Applying our model and algorithm to both
simulated and experimental data, we demonstrate an improvement in
reconstruction quality over previous models as well as an increase in
convergence speed compared to existing first-order methods.Comment: 28 pages, 10 figures, accepted for publication in SIAM Journal on
Imaging Science
Convolutional Analysis Operator Learning: Acceleration and Convergence
Convolutional operator learning is gaining attention in many signal
processing and computer vision applications. Learning kernels has mostly relied
on so-called patch-domain approaches that extract and store many overlapping
patches across training signals. Due to memory demands, patch-domain methods
have limitations when learning kernels from large datasets -- particularly with
multi-layered structures, e.g., convolutional neural networks -- or when
applying the learned kernels to high-dimensional signal recovery problems. The
so-called convolution approach does not store many overlapping patches, and
thus overcomes the memory problems particularly with careful algorithmic
designs; it has been studied within the "synthesis" signal model, e.g.,
convolutional dictionary learning. This paper proposes a new convolutional
analysis operator learning (CAOL) framework that learns an analysis sparsifying
regularizer with the convolution perspective, and develops a new convergent
Block Proximal Extrapolated Gradient method using a Majorizer (BPEG-M) to solve
the corresponding block multi-nonconvex problems. To learn diverse filters
within the CAOL framework, this paper introduces an orthogonality constraint
that enforces a tight-frame filter condition, and a regularizer that promotes
diversity between filters. Numerical experiments show that, with sharp
majorizers, BPEG-M significantly accelerates the CAOL convergence rate compared
to the state-of-the-art block proximal gradient (BPG) method. Numerical
experiments for sparse-view computational tomography show that a convolutional
sparsifying regularizer learned via CAOL significantly improves reconstruction
quality compared to a conventional edge-preserving regularizer. Using more and
wider kernels in a learned regularizer better preserves edges in reconstructed
images.Comment: 22 pages, 11 figures, fixed incorrect math theorem numbers in fig.
Catalyst Acceleration for Gradient-Based Non-Convex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using
gradient-based algorithms originally designed for minimizing convex functions.
Even though these methods may originally require convexity to operate, the
proposed approach allows one to use them on weakly convex objectives, which
covers a large class of non-convex functions typically appearing in machine
learning and signal processing. In general, the scheme is guaranteed to produce
a stationary point with a worst-case efficiency typical of first-order methods,
and when the objective turns out to be convex, it automatically accelerates in
the sense of Nesterov and achieves near-optimal convergence rate in function
values. These properties are achieved without assuming any knowledge about the
convexity of the objective, by automatically adapting to the unknown weak
convexity constant. We conclude the paper by showing promising experimental
results obtained by applying our approach to incremental algorithms such as
SVRG and SAGA for sparse matrix factorization and for learning neural networks
Distributed Algorithms in Large-scaled Empirical Risk Minimization: Non-convexity, Adaptive-sampling, and Matrix-free Second-order Methods
The rising amount of data has changed the classical approaches in statistical modeling significantly. Special methods are designed for inferring meaningful relationships and hidden patterns from these large datasets, which build the foundation of a study called Machine Learning (ML). Such ML techniques have already applied widely in various areas and achieved compelling success. In the meantime, the huge amount of data also requires a deep revolution of current techniques, like the availability of advanced data storage, new efficient large-scale algorithms, and their distributed/parallelized implementation.There is a broad class of ML methods can be interpreted as Empirical Risk Minimization (ERM) problems. When utilizing various loss functions and likely necessary regularization terms, one could approach their specific ML goals by solving ERMs as separable finite sum optimization problems. There are circumstances where the nonconvex component is introduced into the ERMs which usually makes the problems hard to optimize. Especially, in recent years, neural networks, a popular branch of ML, draw numerous attention from the community. Neural networks are powerful and highly flexible inspired by the structured functionality of the brain. Typically, neural networks could be treated as large-scale and highly nonconvex ERMs.While as nonconvex ERMs become more complex and larger in scales, optimization using stochastic gradient descent (SGD) type methods proceeds slowly regarding its convergence rate and incapability of being distributed efficiently. It motivates researchers to explore more advanced local optimization methods such as approximate-Newton/second-order methods.In this dissertation, first-order stochastic optimization for the regularized ERMs in Chapter1 is studied. Based on the development of stochastic dual coordinate accent (SDCA) method, a dual free SDCA with non-uniform mini-batch sampling strategy is investigated [30, 29]. We also introduce several efficient algorithms for training ERMs, including neural networks, using second-order optimization methods in a distributed environment. In Chapter 2, we propose a practical distributed implementation for Newton-CG methods. It makes training neural networks by second-order methods doable in the distributed environment [28]. In Chapter 3, we further build steps towards using second-order methods to train feed-forward neural networks with negative curvature direction utilization and momentum acceleration. In this Chapter, we also report numerical experiments for comparing second-order methods and first-order methods regarding training neural networks. The following Chapter 4 purpose an distributed accumulative sample-size second-order methods for solving large-scale convex ERMs and nonconvex neural networks [35]. In Chapter 5, a python library named UCLibrary is briefly introduced for solving unconstrained optimization problems. This dissertation is all concluded in the last Chapter 6
- …