323 research outputs found
Orthogonally Decoupled Variational Gaussian Processes
Gaussian processes (GPs) provide a powerful non-parametric framework for
reasoning over functions. Despite appealing theory, its superlinear
computational and memory complexities have presented a long-standing challenge.
State-of-the-art sparse variational inference methods trade modeling accuracy
against complexity. However, the complexities of these methods still scale
superlinearly in the number of basis functions, implying that that sparse GP
methods are able to learn from large datasets only when a small model is used.
Recently, a decoupled approach was proposed that removes the unnecessary
coupling between the complexities of modeling the mean and the covariance
functions of a GP. It achieves a linear complexity in the number of mean
parameters, so an expressive posterior mean function can be modeled. While
promising, this approach suffers from optimization difficulties due to
ill-conditioning and non-convexity. In this work, we propose an alternative
decoupled parametrization. It adopts an orthogonal basis in the mean function
to model the residues that cannot be learned by the standard coupled approach.
Therefore, our method extends, rather than replaces, the coupled approach to
achieve strictly better performance. This construction admits a straightforward
natural gradient update rule, so the structure of the information manifold that
is lost during decoupling can be leveraged to speed up learning. Empirically,
our algorithm demonstrates significantly faster convergence in multiple
experiments.Comment: Appearing NIPS 201
Orthogonally Decoupled Variational Gaussian Processes
Gaussian processes (GPs) provide a powerful non-parametric framework for reasoning over functions. Despite appealing theory, its superlinear computational and memory complexities have presented a long-standing challenge. State-of-the-art sparse variational inference methods trade modeling accuracy against complexity. However, the complexities of these methods still scale superlinearly in the number of basis functions, implying that that sparse GP methods are able to learn from large datasets only when a small model is used. Recently, a decoupled approach was proposed that removes the unnecessary coupling between the complexities of modeling the mean and the covariance functions of a GP. It achieves a linear complexity in the number of mean parameters, so an expressive posterior mean function can be modeled. While promising, this approach suffers from optimization difficulties due to ill-conditioning and non-convexity. In this work, we propose an alternative decoupled parametrization. It adopts an orthogonal basis in the mean function to model the residues that cannot be learned by the standard coupled approach. Therefore, our method extends, rather than replaces, the coupled approach to achieve strictly better performance. This construction admits a straightforward natural gradient update rule, so the structure of the information manifold that is lost during decoupling can be leveraged to speed up learning. Empirically, our algorithm demonstrates significantly faster convergence in multiple experiments
Multi-component optical solitary waves
We discuss several novel types of multi-component (temporal and spatial)
envelope solitary waves that appear in fiber and waveguide nonlinear optics. In
particular, we describe multi-channel solitary waves in bit-parallel-wavelength
fiber transmission systems for high performance computer networks, multi-colour
parametric spatial solitary waves due to cascaded nonlinearities of quadratic
materials, and quasiperiodic envelope solitons due to quasi-phase-matching in
Fibonacci optical superlattices.Comment: 12 pages, 11 figures; To be published in: Proceedings of the Dynamics
Days Asia-Pacific: First International Conference on Nonlinear Science
(Hong-Kong, 13-16 July, 1999), Editor: Bambi Hu (Elsevier Publishers, 2000
Large-Scale Gaussian Processes via Alternating Projection
Gaussian process (GP) hyperparameter optimization requires repeatedly solving
linear systems with kernel matrices. To address the prohibitive
time complexity, recent work has employed fast iterative
numerical methods, like conjugate gradients (CG). However, as datasets increase
in magnitude, the corresponding kernel matrices become increasingly
ill-conditioned and still require space without
partitioning. Thus, while CG increases the size of datasets GPs can be trained
on, modern datasets reach scales beyond its applicability. In this work, we
propose an iterative method which only accesses subblocks of the kernel matrix,
effectively enabling \emph{mini-batching}. Our algorithm, based on alternating
projection, has per-iteration time and space complexity,
solving many of the practical challenges of scaling GPs to very large datasets.
Theoretically, we prove our method enjoys linear convergence and empirically we
demonstrate its robustness to ill-conditioning. On large-scale benchmark
datasets up to four million datapoints our approach accelerates training by a
factor of 2 to 27 compared to CG
- …