1,953 research outputs found
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Programming MPSoC platforms: Road works ahead
This paper summarizes a special session on multicore/multi-processor system-on-chip (MPSoC) programming challenges. The current trend towards MPSoC platforms in most computing domains does not only mean a radical change in computer architecture. Even more important from a SW developer´s viewpoint, at the same time the classical sequential von Neumann programming model needs to be overcome. Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel SW. While several standards are established in the high-performance computing domain (e.g. OpenMP), it is clear that more innovations are required for successful\ud
deployment of heterogeneous embedded MPSoC. On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool and code replacement strategy
The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition
We present the Glasgow Parallel Reduction Machine (GPRM), a novel, flexible
framework for parallel task-composition based many-core programming. We allow
the programmer to structure programs into task code, written as C++ classes,
and communication code, written in a restricted subset of C++ with functional
semantics and parallel evaluation. In this paper we discuss the GPRM, the
virtual machine framework that enables the parallel task composition approach.
We focus the discussion on GPIR, the functional language used as the
intermediate representation of the bytecode running on the GPRM. Using examples
in this language we show the flexibility and power of our task composition
framework. We demonstrate the potential using an implementation of a merge sort
algorithm on a 64-core Tilera processor, as well as on a conventional Intel
quad-core processor and an AMD 48-core processor system. We also compare our
framework with OpenMP tasks in a parallel pointer chasing algorithm running on
the Tilera processor. Our results show that the GPRM programs outperform the
corresponding OpenMP codes on all test platforms, and can greatly facilitate
writing of parallel programs, in particular non-data parallel algorithms such
as reductions.Comment: In Proceedings PLACES 2013, arXiv:1312.221
RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors
Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors.
This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance
Massively Parallel Computing at the Large Hadron Collider up to the HL-LHC
As the Large Hadron Collider (LHC) continues its upward progression in energy
and luminosity towards the planned High-Luminosity LHC (HL-LHC) in 2025, the
challenges of the experiments in processing increasingly complex events will
also continue to increase. Improvements in computing technologies and
algorithms will be a key part of the advances necessary to meet this challenge.
Parallel computing techniques, especially those using massively parallel
computing (MPC), promise to be a significant part of this effort. In these
proceedings, we discuss these algorithms in the specific context of a
particularly important problem: the reconstruction of charged particle tracks
in the trigger algorithms in an experiment, in which high computing performance
is critical for executing the track reconstruction in the available time. We
discuss some areas where parallel computing has already shown benefits to the
LHC experiments, and also demonstrate how a MPC-based trigger at the CMS
experiment could not only improve performance, but also extend the reach of the
CMS trigger system to capture events which are currently not practical to
reconstruct at the trigger level.Comment: 14 pages, 6 figures. Proceedings of 2nd International Summer School
on Intelligent Signal Processing for Frontier Research and Industry
(INFIERI2014), to appear in JINST. Revised version in response to referee
comment
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Computation of the asymptotic states of modulated open quantum systems with a numerically exact realization of the quantum trajectory method
Quantum systems out of equilibrium are presently a subject of active
research, both in theoretical and experimental domains. In this work we
consider time-periodically modulated quantum systems which are in contact with
a stationary environment. Within the framework of a quantum master equation,
the asymptotic states of such systems are described by time-periodic density
operators. Resolution of these operators constitutes a non-trivial
computational task. To go beyond the current size limits, we use the quantum
trajectory method which unravels master equation for the density operator into
a set of stochastic processes for wave functions. The asymptotic density matrix
is calculated by performing a statistical sampling over the ensemble of quantum
trajectories, preceded by a long transient propagation. We follow the ideology
of event-driven programming and construct a new algorithmic realization of the
method. The algorithm is computationally efficient, allowing for long 'leaps'
forward in time, and is numerically exact in the sense that, being given the
list of uniformly distributed (on the unit interval) random numbers, , one could propagate a quantum trajectory (with 's
as norm thresholds) in a numerically exact way. %Since the quantum trajectory
method falls into the class of standard sampling problems, performance of the
algorithm %can be substantially improved by implementing it on a computer
cluster. By using a scalable -particle quantum model, we demonstrate that
the algorithm allows us to resolve the asymptotic density operator of the model
system with states on a regular-size computer cluster, thus reaching
the scale on which numerical studies of modulated Hamiltonian systems are
currently performed
- …