3,988 research outputs found

    Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

    Full text link
    Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK, and typically each operation can be obtained in many alternative ways. Interestingly, identifying the fastest implementation -- without executing it -- is a challenging task even for experts. An equally challenging task is that of tuning each routine to performance-optimal configurations. Indeed, the problem is so difficult that even the default values provided by the libraries are often considerably suboptimal; as a solution, normally one has to resort to executing and timing the routines, driven by some form of parameter search. In this paper, we discuss a methodology to solve both problems: identifying the best performing algorithm within a family of alternatives, and tuning algorithmic parameters for maximum performance; in both cases, we do not execute the algorithms themselves. Instead, our methodology relies on timing and modeling the computational kernels underlying the algorithms, and on a technique for tracking the contents of the CPU cache. In general, our performance predictions allow us to tune dense linear algebra algorithms within few percents from the best attainable results, thus allowing computational scientists and code developers alike to efficiently optimize their linear algebra routines and codes.Comment: Submitted to PMBS1

    Reachability Analysis of Innermost Rewriting

    Get PDF
    We consider the problem of inferring a grammar describing the output of a functional program given a grammar describing its input. Solutions to this problem are helpful for detecting bugs or proving safety properties of functional programs and, several rewriting tools exist for solving this problem. However, known grammar inference techniques are not able to take evaluation strategies of the program into account. This yields very imprecise results when the evaluation strategy matters. In this work, we adapt the Tree Automata Completion algorithm to approximate accurately the set of terms reachable by rewriting under the innermost strategy. We prove that the proposed technique is sound and precise w.r.t. innermost rewriting. The proposed algorithm has been implemented in the Timbuk reachability tool. Experiments show that it noticeably improves the accuracy of static analysis for functional programs using the call-by-value evaluation strategy

    Concrete resource analysis of the quantum linear system algorithm used to compute the electromagnetic scattering cross section of a 2D target

    Get PDF
    We provide a detailed estimate for the logical resource requirements of the quantum linear system algorithm (QLSA) [Phys. Rev. Lett. 103, 150502 (2009)] including the recently described elaborations [Phys. Rev. Lett. 110, 250504 (2013)]. Our resource estimates are based on the standard quantum-circuit model of quantum computation; they comprise circuit width, circuit depth, the number of qubits and ancilla qubits employed, and the overall number of elementary quantum gate operations as well as more specific gate counts for each elementary fault-tolerant gate from the standard set {X, Y, Z, H, S, T, CNOT}. To perform these estimates, we used an approach that combines manual analysis with automated estimates generated via the Quipper quantum programming language and compiler. Our estimates pertain to the example problem size N=332,020,680 beyond which, according to a crude big-O complexity comparison, QLSA is expected to run faster than the best known classical linear-system solving algorithm. For this problem size, a desired calculation accuracy 0.01 requires an approximate circuit width 340 and circuit depth of order 102510^{25} if oracle costs are excluded, and a circuit width and depth of order 10810^8 and 102910^{29}, respectively, if oracle costs are included, indicating that the commonly ignored oracle resources are considerable. In addition to providing detailed logical resource estimates, it is also the purpose of this paper to demonstrate explicitly how these impressively large numbers arise with an actual circuit implementation of a quantum algorithm. While our estimates may prove to be conservative as more efficient advanced quantum-computation techniques are developed, they nevertheless provide a valid baseline for research targeting a reduction of the resource requirements, implying that a reduction by many orders of magnitude is necessary for the algorithm to become practical.Comment: 37 pages, 40 figure

    Automatic generation of robot and manual assembly plans using octrees

    Get PDF
    This paper aims to investigate automatic assembly planning for robot and manual assembly. The octree decomposition technique is applied to approximate CAD models with an octree representation which are then used to generate robot and manual assembly plans. An assembly planning system able to generate assembly plans was developed to build these prototype models. Octree decomposition is an effective assembly planning tool. Assembly plans can automatically be generated for robot and manual assembly using octree models. Research limitations/implications - One disadvantage of the octree decomposition technique is that it approximates a part model with cubes instead of using the actual model. This limits its use and applications when complex assemblies must be planned, but in the context of prototyping can allow a rough component to be formed which can later be finished by hand. Assembly plans can be generated using octree decomposition, however, new algorithms must be developed to overcome its limitations
    • …
    corecore