115,562 research outputs found
Online Learning of a Memory for Learning Rates
The promise of learning to learn for robotics rests on the hope that by
extracting some information about the learning process itself we can speed up
subsequent similar learning tasks. Here, we introduce a computationally
efficient online meta-learning algorithm that builds and optimizes a memory
model of the optimal learning rate landscape from previously observed gradient
behaviors. While performing task specific optimization, this memory of learning
rates predicts how to scale currently observed gradients. After applying the
gradient scaling our meta-learner updates its internal memory based on the
observed effect its prediction had. Our meta-learner can be combined with any
gradient-based optimizer, learns on the fly and can be transferred to new
optimization tasks. In our evaluations we show that our meta-learning algorithm
speeds up learning of MNIST classification and a variety of learning control
tasks, either in batch or online learning settings.Comment: accepted to ICRA 2018, code available:
https://github.com/fmeier/online-meta-learning ; video pitch available:
https://youtu.be/9PzQ25FPPO
GPU-based Streaming for Parallel Level of Detail on Massive Model Rendering
Rendering massive 3D models in real-time has long been recognized as a very challenging problem because of the limited computational power and memory space available in a workstation. Most existing rendering techniques, especially level of detail (LOD) processing, have suffered from their sequential execution natures, and does not scale well with the size of the models. We present a GPU-based progressive mesh simplification approach which enables the interactive rendering of large 3D models with hundreds of millions of triangles. Our work contributes to the massive rendering research in two ways. First, we develop a novel data structure to represent the progressive LOD mesh, and design a parallel mesh simplification algorithm towards GPU architecture. Second, we propose a GPU-based streaming approach which adopt a frame-to-frame coherence scheme in order to minimize the high communication cost between CPU and GPU. Our results show that the parallel mesh simplification algorithm and GPU-based streaming approach significantly improve the overall rendering performance
Hamiltonian Monte Carlo Acceleration Using Surrogate Functions with Random Bases
For big data analysis, high computational cost for Bayesian methods often
limits their applications in practice. In recent years, there have been many
attempts to improve computational efficiency of Bayesian inference. Here we
propose an efficient and scalable computational technique for a
state-of-the-art Markov Chain Monte Carlo (MCMC) methods, namely, Hamiltonian
Monte Carlo (HMC). The key idea is to explore and exploit the structure and
regularity in parameter space for the underlying probabilistic model to
construct an effective approximation of its geometric properties. To this end,
we build a surrogate function to approximate the target distribution using
properly chosen random bases and an efficient optimization process. The
resulting method provides a flexible, scalable, and efficient sampling
algorithm, which converges to the correct target distribution. We show that by
choosing the basis functions and optimization process differently, our method
can be related to other approaches for the construction of surrogate functions
such as generalized additive models or Gaussian process models. Experiments
based on simulated and real data show that our approach leads to substantially
more efficient sampling algorithms compared to existing state-of-the art
methods
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
- …