3,988 research outputs found
Cache-aware Performance Modeling and Prediction for Dense Linear Algebra
Countless applications cast their computational core in terms of dense linear
algebra operations. These operations can usually be implemented by combining
the routines offered by standard linear algebra libraries such as BLAS and
LAPACK, and typically each operation can be obtained in many alternative ways.
Interestingly, identifying the fastest implementation -- without executing it
-- is a challenging task even for experts. An equally challenging task is that
of tuning each routine to performance-optimal configurations. Indeed, the
problem is so difficult that even the default values provided by the libraries
are often considerably suboptimal; as a solution, normally one has to resort to
executing and timing the routines, driven by some form of parameter search. In
this paper, we discuss a methodology to solve both problems: identifying the
best performing algorithm within a family of alternatives, and tuning
algorithmic parameters for maximum performance; in both cases, we do not
execute the algorithms themselves. Instead, our methodology relies on timing
and modeling the computational kernels underlying the algorithms, and on a
technique for tracking the contents of the CPU cache. In general, our
performance predictions allow us to tune dense linear algebra algorithms within
few percents from the best attainable results, thus allowing computational
scientists and code developers alike to efficiently optimize their linear
algebra routines and codes.Comment: Submitted to PMBS1
Reachability Analysis of Innermost Rewriting
We consider the problem of inferring a grammar describing the output of a functional program given a grammar describing its input. Solutions to this problem are helpful for detecting bugs or proving safety properties of functional programs and, several rewriting tools exist for solving this problem. However, known grammar inference techniques are not able to take evaluation strategies of the program into account. This yields very imprecise results when the evaluation strategy matters. In this work, we adapt the Tree Automata Completion algorithm to approximate accurately the set of
terms reachable by rewriting under the innermost strategy. We prove that the proposed technique is sound and precise w.r.t. innermost rewriting. The proposed algorithm has been implemented in the Timbuk reachability tool. Experiments show that it noticeably improves the accuracy of static analysis for functional programs using the call-by-value evaluation strategy
Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target
We provide a detailed estimate for the logical resource requirements of the
quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)]
including the recently described elaborations [Phys. Rev. Lett. 110, 250504
(2013)]. Our resource estimates are based on the standard quantum-circuit model
of quantum computation; they comprise circuit width, circuit depth, the number
of qubits and ancilla qubits employed, and the overall number of elementary
quantum gate operations as well as more specific gate counts for each
elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}.
To perform these estimates, we used an approach that combines manual analysis
with automated estimates generated via the Quipper quantum programming language
and compiler. Our estimates pertain to the example problem size N=332,020,680
beyond which, according to a crude big-O complexity comparison, QLSA is
expected to run faster than the best known classical linear-system solving
algorithm. For this problem size, a desired calculation accuracy 0.01 requires
an approximate circuit width 340 and circuit depth of order if oracle
costs are excluded, and a circuit width and depth of order and
, respectively, if oracle costs are included, indicating that the
commonly ignored oracle resources are considerable. In addition to providing
detailed logical resource estimates, it is also the purpose of this paper to
demonstrate explicitly how these impressively large numbers arise with an
actual circuit implementation of a quantum algorithm. While our estimates may
prove to be conservative as more efficient advanced quantum-computation
techniques are developed, they nevertheless provide a valid baseline for
research targeting a reduction of the resource requirements, implying that a
reduction by many orders of magnitude is necessary for the algorithm to become
practical.Comment: 37 pages, 40 figure
Automatic generation of robot and manual assembly plans using octrees
This paper aims to investigate automatic assembly planning for robot and manual assembly. The octree decomposition technique is applied to approximate CAD models with an octree representation which are then used to generate robot and manual assembly plans. An assembly planning system able to generate assembly plans was developed to build these prototype models. Octree decomposition is an effective assembly planning tool. Assembly plans can automatically be generated for robot and manual assembly using octree models. Research limitations/implications - One disadvantage of the octree decomposition technique is that it approximates a part model with cubes instead of using the actual model. This limits its use and applications when complex assemblies must be planned, but in the context of prototyping can allow a rough component to be formed which can later be finished by hand. Assembly plans can be generated using octree decomposition, however, new algorithms must be developed to overcome its limitations
- …