7,795 research outputs found
MGSim - Simulation tools for multi-core processor architectures
MGSim is an open source discrete event simulator for on-chip hardware
components, developed at the University of Amsterdam. It is intended to be a
research and teaching vehicle to study the fine-grained hardware/software
interactions on many-core and hardware multithreaded processors. It includes
support for core models with different instruction sets, a configurable
multi-core interconnect, multiple configurable cache and memory models, a
dedicated I/O subsystem, and comprehensive monitoring and interaction
facilities. The default model configuration shipped with MGSim implements
Microgrids, a many-core architecture with hardware concurrency management.
MGSim is furthermore written mostly in C++ and uses object classes to represent
chip components. It is optimized for architecture models that can be described
as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table
64-bit architechtures and compute clusters for high performance simulations
Simulation of large complex systems remains one of the most demanding
of high performance computer systems both in terms of raw compute performance
and efficient memory management. Recent availability of 64-bit
architectures has opened up the possibilities of commodity computers accessing
more than the 4 Gigabyte memory limit previously enforced by 32-bit
addressing. We report on some performance measurements we have made on
two 64-bit architectures and their consequences for some high performance
simulations. We discuss performance of our codes for simulations of artificial
life models; computational physics models of point particles on lattices; and
with interacting clusters of particles. We have summarised pertinent features
of these codes into benchmark kernels which we discuss in the context of wellknown
benchmark kernels of the 32-bit era. We report on how these these
findings were useful in the context of designing 64-bit compute clusters for
high-performance simulations
Energy-efficiency evaluation of Intel KNL for HPC workloads
Energy consumption is increasingly becoming a limiting factor to the design
of faster large-scale parallel systems, and development of energy-efficient and
energy-aware applications is today a relevant issue for HPC code-developer
communities. In this work we focus on energy performance of the Knights Landing
(KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel
into the HPC market. We take into account the 64-core Xeon Phi 7230, and
analyze its energy performance using both the on-chip MCDRAM and the regular
DDR4 system memory as main storage for the application data-domain. As a
benchmark application we use a Lattice Boltzmann code heavily optimized for
this architecture and implemented using different memory data layouts to store
its lattice. We assessthen the energy consumption using different memory
data-layouts, kind of memory (DDR4 or MCDRAM) and number of threads per core
Influence of Input/output Operations on Processor Performance
Nowadays, computers are frequently equipped with peripherals that transfer great
amounts of data between them and the system memory using direct memory access
techniques (i.e., digital cameras, high speed networks, . . . ). Those peripherals prevent the
processor from accessing system memory for significant periods of time (i.e., while they
are communicating with system memory in order to send or receive data blocks). In this
paper we study the negative effects that I/O operations from computer peripherals have on
processor performance. With the help of a set of routines (SMPL) used to make discrete
event simulators, we have developed a configurable software that simulates a computer
processor and main memory as well as the I/O scenarios where the periph-erals operate.
This software has been used to analyze the performance of four different processors in four
I/O scenarios: video capture, video capture and playback, high speed network, and serial
transmission
Exploring performance and power properties of modern multicore chips via simple machine models
Modern multicore chips show complex behavior with respect to performance and
power. Starting with the Intel Sandy Bridge processor, it has become possible
to directly measure the power dissipation of a CPU chip and correlate this data
with the performance properties of the running code. Going beyond a simple
bottleneck analysis, we employ the recently published Execution-Cache-Memory
(ECM) model to describe the single- and multi-core performance of streaming
kernels. The model refines the well-known roofline model, since it can predict
the scaling and the saturation behavior of bandwidth-limited loop kernels on a
multicore chip. The saturation point is especially relevant for considerations
of energy consumption. From power dissipation measurements of benchmark
programs with vastly different requirements to the hardware, we derive a
simple, phenomenological power model for the Sandy Bridge processor. Together
with the ECM model, we are able to explain many peculiarities in the
performance and power behavior of multicore processors, and derive guidelines
for energy-efficient execution of parallel programs. Finally, we show that the
ECM and power models can be successfully used to describe the scaling and power
behavior of a lattice-Boltzmann flow solver code.Comment: 23 pages, 10 figures. Typos corrected, DOI adde
- …