6,199 research outputs found
Performance of SSE and AVX Instruction Sets
SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD
(single instruction multiple data streams) instruction sets supported by recent
CPUs manufactured in Intel and AMD. This SIMD programming allows parallel
processing by multiple cores in a single CPU. Basic arithmetic and data
transfer operations such as sum, multiplication and square root can be
processed simultaneously. Although popular compilers such as GNU compilers and
Intel compilers provide automatic SIMD optimization options, one can obtain
better performance by a manual SIMD programming with proper optimization: data
packing, data reuse and asynchronous data transfer. In particular, linear
algebraic operations of vectors and matrices can be easily optimized by the
SIMD programming. Typical calculations in lattice gauge theory are composed of
linear algebraic operations of gauge link matrices and fermion vectors, and so
can adopt the manual SIMD programming to improve the performance.Comment: 7 pages, 5 figures, 4 tables, Contribution to proceedings of the 30th
International Symposium on Lattice Field Theory (Lattice 2012), June 24-29,
201
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
A Language for Description of Instruction Sets
V této práci je představen návrh jednoduchého jazyka pro popis architektury mikroprocesoru zaměřeného na popis instrukční sady. Dále je popsána implementace interpretu tohoto jazyka, který je schopen simulovat popsanou architekturu. Tato práce může zároveň sloužit jako návod k používání tohoto interpretu.This bachelor's thesis introduces a simple concept of a language for description of microprocessor architecture, namely the instruction set. An interpreter of the language capable of simulating the behavior of the architecture is briefly described. This text may also serve as a manual for using the interpreter.
Analytical Query Processing Using Heterogeneous SIMD Instruction Sets
Numerous applications gather increasing amounts of data, which have to be managed and queried. Different hardware developments help to meet this challenge. The grow-ing capacity of main memory enables database systems to keep all their data in memory. Additionally, the hardware landscape is becoming more diverse. A plethora of homo-geneous and heterogeneous co-processors is available, where heterogeneity refers not only to a different computing power, but also to different instruction set architectures. For instance, modern Intel® CPUs offer different instruction sets supporting the Single Instruction Multiple Data (SIMD) paradigm, e.g. SSE, AVX, and AVX512.
Database systems have started to exploit SIMD to increase performance. However, this is still a challenging task, because existing algorithms were mainly developed for scalar processing and because there is a huge variety of different instruction sets, which were never standardized and have no unified interface. This requires to completely rewrite the source code for porting a system to another hardware architecture, even if those archi-tectures are not fundamentally different and designed by the same company. Moreover, operations on large registers, which are the core principle of SIMD processing, behave counter-intuitively in several cases. This is especially true for analytical query process-ing, where different memory access patterns and data dependencies caused by the com-pression of data, challenge the limits of the SIMD principle. Finally, there are physical constraints to the use of such instructions affecting the CPU frequency scaling, which is further influenced by the use of multiple cores. This is because the supply power of a CPU is limited, such that not all transistors can be powered at the same time. Hence, there is a complex relationship between performance and power, and therefore also between performance and energy consumption.
This thesis addresses the specific challenges, which are introduced by the application of SIMD in general, and the heterogeneity of SIMD ISAs in particular. Hence, the goal of this thesis is to exploit the potential of heterogeneous SIMD ISAs for increasing the performance as well as the energy-efficiency
On instruction sets for Boolean registers in program algebra
In previous work carried out in the setting of program algebra, including
work in the area of instruction sequence size complexity, we chose instruction
sets for Boolean registers that contain only instructions of a few of the
possible kinds. In the current paper, we study instruction sequence size
bounded functional completeness of all possible instruction sets for Boolean
registers. We expect that the results of this study will turn out to be useful
to adequately assess results of work that is concerned with lower bounds of
instruction sequence size complexity.Comment: 18 pages, the preliminaries are largely the same as the preliminaries
in arXiv:1402.4950 [cs.PL] and some earlier papers; 21 pages, presentation
improve
Instruction Set Architectures for Quantum Processing Units
Progress in quantum computing hardware raises questions about how these
devices can be controlled, programmed, and integrated with existing
computational workflows. We briefly describe several prominent quantum
computational models, their associated quantum processing units (QPUs), and the
adoption of these devices as accelerators within high-performance computing
systems. Emphasizing the interface to the QPU, we analyze instruction set
architectures based on reduced and complex instruction sets, i.e., RISC and
CISC architectures. We clarify the role of conventional constraints on memory
addressing and instruction widths within the quantum computing context.
Finally, we examine existing quantum computing platforms, including the D-Wave
2000Q and IBM Quantum Experience, within the context of future ISA development
and HPC needs.Comment: To be published in the proceedings in the International Super
Computing Conference 2017 publicatio
- …