20,583 research outputs found
PoseAgent: Budget-Constrained 6D Object Pose Estimation via Reinforcement Learning
State-of-the-art computer vision algorithms often achieve efficiency by
making discrete choices about which hypotheses to explore next. This allows
allocation of computational resources to promising candidates, however, such
decisions are non-differentiable. As a result, these algorithms are hard to
train in an end-to-end fashion. In this work we propose to learn an efficient
algorithm for the task of 6D object pose estimation. Our system optimizes the
parameters of an existing state-of-the art pose estimation system using
reinforcement learning, where the pose estimation system now becomes the
stochastic policy, parametrized by a CNN. Additionally, we present an efficient
training algorithm that dramatically reduces computation time. We show
empirically that our learned pose estimation procedure makes better use of
limited resources and improves upon the state-of-the-art on a challenging
dataset. Our approach enables differentiable end-to-end training of complex
algorithmic pipelines and learns to make optimal use of a given computational
budget
Cache-aware Performance Modeling and Prediction for Dense Linear Algebra
Countless applications cast their computational core in terms of dense linear
algebra operations. These operations can usually be implemented by combining
the routines offered by standard linear algebra libraries such as BLAS and
LAPACK, and typically each operation can be obtained in many alternative ways.
Interestingly, identifying the fastest implementation -- without executing it
-- is a challenging task even for experts. An equally challenging task is that
of tuning each routine to performance-optimal configurations. Indeed, the
problem is so difficult that even the default values provided by the libraries
are often considerably suboptimal; as a solution, normally one has to resort to
executing and timing the routines, driven by some form of parameter search. In
this paper, we discuss a methodology to solve both problems: identifying the
best performing algorithm within a family of alternatives, and tuning
algorithmic parameters for maximum performance; in both cases, we do not
execute the algorithms themselves. Instead, our methodology relies on timing
and modeling the computational kernels underlying the algorithms, and on a
technique for tracking the contents of the CPU cache. In general, our
performance predictions allow us to tune dense linear algebra algorithms within
few percents from the best attainable results, thus allowing computational
scientists and code developers alike to efficiently optimize their linear
algebra routines and codes.Comment: Submitted to PMBS1
Recommended from our members
MILO : a microarchitecture and logic optimizer
In this report we discuss strengths and weaknesses of logic synthesis systems and describe a system for microarchitectural and logic optimization. Our system uses a set of algorithms for synthesizing SSI/MSI macros from parameterized microarchitecture components. In addition, it uses rules for optimizing both at the microarchitecture and logic level. The system increases designer productivity and requires less design knowledge and experience from circuit engineers
A Generic Framework for Engineering Graph Canonization Algorithms
The state-of-the-art tools for practical graph canonization are all based on
the individualization-refinement paradigm, and their difference is primarily in
the choice of heuristics they include and in the actual tool implementation. It
is thus not possible to make a direct comparison of how individual algorithmic
ideas affect the performance on different graph classes.
We present an algorithmic software framework that facilitates implementation
of heuristics as independent extensions to a common core algorithm. It
therefore becomes easy to perform a detailed comparison of the performance and
behaviour of different algorithmic ideas. Implementations are provided of a
range of algorithms for tree traversal, target cell selection, and node
invariant, including choices from the literature and new variations. The
framework readily supports extraction and visualization of detailed data from
separate algorithm executions for subsequent analysis and development of new
heuristics.
Using collections of different graph classes we investigate the effect of
varying the selections of heuristics, often revealing exactly which individual
algorithmic choice is responsible for particularly good or bad performance. On
several benchmark collections, including a newly proposed class of difficult
instances, we additionally find that our implementation performs better than
the current state-of-the-art tools
- …