6,199 research outputs found

    Performance of SSE and AVX Instruction Sets

    Full text link
    SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel processing by multiple cores in a single CPU. Basic arithmetic and data transfer operations such as sum, multiplication and square root can be processed simultaneously. Although popular compilers such as GNU compilers and Intel compilers provide automatic SIMD optimization options, one can obtain better performance by a manual SIMD programming with proper optimization: data packing, data reuse and asynchronous data transfer. In particular, linear algebraic operations of vectors and matrices can be easily optimized by the SIMD programming. Typical calculations in lattice gauge theory are composed of linear algebraic operations of gauge link matrices and fermion vectors, and so can adopt the manual SIMD programming to improve the performance.Comment: 7 pages, 5 figures, 4 tables, Contribution to proceedings of the 30th International Symposium on Lattice Field Theory (Lattice 2012), June 24-29, 201

    Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

    Get PDF
    The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

    A Language for Description of Instruction Sets

    Get PDF
    V této práci je představen návrh jednoduchého jazyka pro popis architektury mikroprocesoru zaměřeného na popis instrukční sady. Dále je popsána implementace interpretu tohoto jazyka, který je schopen simulovat popsanou architekturu. Tato práce může zároveň sloužit jako návod k používání tohoto interpretu.This bachelor's thesis introduces a simple concept of a language for description of microprocessor architecture, namely the instruction set. An interpreter of the language capable of simulating the behavior of the architecture is briefly described. This text may also serve as a manual for using the interpreter.

    Analytical Query Processing Using Heterogeneous SIMD Instruction Sets

    Get PDF
    Numerous applications gather increasing amounts of data, which have to be managed and queried. Different hardware developments help to meet this challenge. The grow-ing capacity of main memory enables database systems to keep all their data in memory. Additionally, the hardware landscape is becoming more diverse. A plethora of homo-geneous and heterogeneous co-processors is available, where heterogeneity refers not only to a different computing power, but also to different instruction set architectures. For instance, modern Intel® CPUs offer different instruction sets supporting the Single Instruction Multiple Data (SIMD) paradigm, e.g. SSE, AVX, and AVX512. Database systems have started to exploit SIMD to increase performance. However, this is still a challenging task, because existing algorithms were mainly developed for scalar processing and because there is a huge variety of different instruction sets, which were never standardized and have no unified interface. This requires to completely rewrite the source code for porting a system to another hardware architecture, even if those archi-tectures are not fundamentally different and designed by the same company. Moreover, operations on large registers, which are the core principle of SIMD processing, behave counter-intuitively in several cases. This is especially true for analytical query process-ing, where different memory access patterns and data dependencies caused by the com-pression of data, challenge the limits of the SIMD principle. Finally, there are physical constraints to the use of such instructions affecting the CPU frequency scaling, which is further influenced by the use of multiple cores. This is because the supply power of a CPU is limited, such that not all transistors can be powered at the same time. Hence, there is a complex relationship between performance and power, and therefore also between performance and energy consumption. This thesis addresses the specific challenges, which are introduced by the application of SIMD in general, and the heterogeneity of SIMD ISAs in particular. Hence, the goal of this thesis is to exploit the potential of heterogeneous SIMD ISAs for increasing the performance as well as the energy-efficiency

    On instruction sets for Boolean registers in program algebra

    Get PDF
    In previous work carried out in the setting of program algebra, including work in the area of instruction sequence size complexity, we chose instruction sets for Boolean registers that contain only instructions of a few of the possible kinds. In the current paper, we study instruction sequence size bounded functional completeness of all possible instruction sets for Boolean registers. We expect that the results of this study will turn out to be useful to adequately assess results of work that is concerned with lower bounds of instruction sequence size complexity.Comment: 18 pages, the preliminaries are largely the same as the preliminaries in arXiv:1402.4950 [cs.PL] and some earlier papers; 21 pages, presentation improve

    Instruction Set Architectures for Quantum Processing Units

    Full text link
    Progress in quantum computing hardware raises questions about how these devices can be controlled, programmed, and integrated with existing computational workflows. We briefly describe several prominent quantum computational models, their associated quantum processing units (QPUs), and the adoption of these devices as accelerators within high-performance computing systems. Emphasizing the interface to the QPU, we analyze instruction set architectures based on reduced and complex instruction sets, i.e., RISC and CISC architectures. We clarify the role of conventional constraints on memory addressing and instruction widths within the quantum computing context. Finally, we examine existing quantum computing platforms, including the D-Wave 2000Q and IBM Quantum Experience, within the context of future ISA development and HPC needs.Comment: To be published in the proceedings in the International Super Computing Conference 2017 publicatio
    corecore