Search CORE

6,199 research outputs found

Performance of SSE and AVX Instruction Sets

Author: Jeong Hwancheol
Kim Sunghoon
Lee Weonjong
Myung Seok-Ho
Publication venue
Publication date: 05/11/2012
Field of study

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel processing by multiple cores in a single CPU. Basic arithmetic and data transfer operations such as sum, multiplication and square root can be processed simultaneously. Although popular compilers such as GNU compilers and Intel compilers provide automatic SIMD optimization options, one can obtain better performance by a manual SIMD programming with proper optimization: data packing, data reuse and asynchronous data transfer. In particular, linear algebraic operations of vectors and matrices can be easily optimized by the SIMD programming. Typical calculations in lattice gauge theory are composed of linear algebraic operations of gauge link matrices and fermion vectors, and so can adopt the manual SIMD programming to improve the performance.Comment: 7 pages, 5 figures, 4 tables, Contribution to proceedings of the 30th International Symposium on Lattice Field Theory (Lattice 2012), June 24-29, 201

arXiv.org e-Print Archive

CiteSeerX

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

A Language for Description of Instruction Sets

Author: Forejtník Jan
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

V této práci je představen návrh jednoduchého jazyka pro popis architektury mikroprocesoru zaměřeného na popis instrukční sady. Dále je popsána implementace interpretu tohoto jazyka, který je schopen simulovat popsanou architekturu. Tato práce může zároveň sloužit jako návod k používání tohoto interpretu.This bachelor's thesis introduces a simple concept of a language for description of microprocessor architecture, namely the instruction set. An interpreter of the language capable of simulating the behavior of the architecture is briefly described. This text may also serve as a manual for using the interpreter.

Digital library of Brno University of Technology

National Repository of Grey Literature

Analytical Query Processing Using Heterogeneous SIMD Instruction Sets

Author: Ungethüm Annett
Publication venue
Publication date: 30/10/2020
Field of study

Numerous applications gather increasing amounts of data, which have to be managed and queried. Different hardware developments help to meet this challenge. The grow-ing capacity of main memory enables database systems to keep all their data in memory. Additionally, the hardware landscape is becoming more diverse. A plethora of homo-geneous and heterogeneous co-processors is available, where heterogeneity refers not only to a different computing power, but also to different instruction set architectures. For instance, modern Intel® CPUs offer different instruction sets supporting the Single Instruction Multiple Data (SIMD) paradigm, e.g. SSE, AVX, and AVX512. Database systems have started to exploit SIMD to increase performance. However, this is still a challenging task, because existing algorithms were mainly developed for scalar processing and because there is a huge variety of different instruction sets, which were never standardized and have no unified interface. This requires to completely rewrite the source code for porting a system to another hardware architecture, even if those archi-tectures are not fundamentally different and designed by the same company. Moreover, operations on large registers, which are the core principle of SIMD processing, behave counter-intuitively in several cases. This is especially true for analytical query process-ing, where different memory access patterns and data dependencies caused by the com-pression of data, challenge the limits of the SIMD principle. Finally, there are physical constraints to the use of such instructions affecting the CPU frequency scaling, which is further influenced by the use of multiple cores. This is because the supply power of a CPU is limited, such that not all transistors can be powered at the same time. Hence, there is a complex relationship between performance and power, and therefore also between performance and energy consumption. This thesis addresses the specific challenges, which are introduced by the application of SIMD in general, and the heterogeneity of SIMD ISAs in particular. Hence, the goal of this thesis is to exploit the potential of heterogeneous SIMD ISAs for increasing the performance as well as the energy-efficiency

Technische Universität Dresden: Qucosa

On instruction sets for Boolean registers in program algebra

Author: Bergstra J.A.
Middelburg C.A.
Publication venue
Publication date: 01/02/2015
Field of study

In previous work carried out in the setting of program algebra, including work in the area of instruction sequence size complexity, we chose instruction sets for Boolean registers that contain only instructions of a few of the possible kinds. In the current paper, we study instruction sequence size bounded functional completeness of all possible instruction sets for Boolean registers. We expect that the results of this study will turn out to be useful to adequately assess results of work that is concerned with lower bounds of instruction sequence size complexity.Comment: 18 pages, the preliminaries are largely the same as the preliminaries in arXiv:1402.4950 [cs.PL] and some earlier papers; 21 pages, presentation improve

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Instruction Set Architectures for Quantum Processing Units

Author: AR Calderbank
AW Harrow
David A Patterson
Michael A Nielsen
Publication venue
Publication date: 19/07/2017
Field of study

Progress in quantum computing hardware raises questions about how these devices can be controlled, programmed, and integrated with existing computational workflows. We briefly describe several prominent quantum computational models, their associated quantum processing units (QPUs), and the adoption of these devices as accelerators within high-performance computing systems. Emphasizing the interface to the QPU, we analyze instruction set architectures based on reduced and complex instruction sets, i.e., RISC and CISC architectures. We clarify the role of conventional constraints on memory addressing and instruction widths within the quantum computing context. Finally, we examine existing quantum computing platforms, including the D-Wave 2000Q and IBM Quantum Experience, within the context of future ISA development and HPC needs.Comment: To be published in the proceedings in the International Super Computing Conference 2017 publicatio

arXiv.org e-Print Archive

Crossref