12,677 research outputs found
Simulating Large Scale Parallel Applications Using Statistical Models for Sequential Execution Blocks
Predicting sequential execution blocks of a large scale parallel application is an essential part of accurate prediction of the overall performance of the application. When simulating a future machine that is not yet fabricated, or a prototype system only available at a small scale, it becomes a significant challenge. Using hardware simulators may not be feasible due to excessively slowed down execution times and insufficient resources. These challenging issues become increasingly difficult in proportion to scale of the simulation. In this paper, we propose an approach based on statistical models to accurately predict the performance of the sequential execution blocks that comprise a parallel application. We de-ployed these techniques in a trace-driven simulation framework to capture both the detailed behavior of the application as well as the overall predicted performance. The technique is validated using both synthetic benchmarks and the NAMD application. Index Termsâparallel simulator, performance prediction, trace-driven, machine learning, statistical model I
Simulation of 1+1 dimensional surface growth and lattices gases using GPUs
Restricted solid on solid surface growth models can be mapped onto binary
lattice gases. We show that efficient simulation algorithms can be realized on
GPUs either by CUDA or by OpenCL programming. We consider a
deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1
dimensions related to the Asymmetric Simple Exclusion Process and show that for
sizes, that fit into the shared memory of GPUs one can achieve the maximum
parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect
to a single CPU of 2.67 GHz). This permits us to study the effect of quenched
columnar disorder, requiring extremely long simulation times. We compare the
CUDA realization with an OpenCL implementation designed for processor clusters
via MPI. A two-lane traffic model with randomized turning points is also
realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com
Distributed Hybrid Simulation of the Internet of Things and Smart Territories
This paper deals with the use of hybrid simulation to build and compose
heterogeneous simulation scenarios that can be proficiently exploited to model
and represent the Internet of Things (IoT). Hybrid simulation is a methodology
that combines multiple modalities of modeling/simulation. Complex scenarios are
decomposed into simpler ones, each one being simulated through a specific
simulation strategy. All these simulation building blocks are then synchronized
and coordinated. This simulation methodology is an ideal one to represent IoT
setups, which are usually very demanding, due to the heterogeneity of possible
scenarios arising from the massive deployment of an enormous amount of sensors
and devices. We present a use case concerned with the distributed simulation of
smart territories, a novel view of decentralized geographical spaces that,
thanks to the use of IoT, builds ICT services to manage resources in a way that
is sustainable and not harmful to the environment. Three different simulation
models are combined together, namely, an adaptive agent-based parallel and
distributed simulator, an OMNeT++ based discrete event simulator and a
script-language simulator based on MATLAB. Results from a performance analysis
confirm the viability of using hybrid simulation to model complex IoT
scenarios.Comment: arXiv admin note: substantial text overlap with arXiv:1605.0487
Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis
Traditional data centers are designed with a rigid architecture of
fit-for-purpose servers that provision resources beyond the average workload in
order to deal with occasional peaks of data. Heterogeneous data centers are
pushing towards more cost-efficient architectures with better resource
provisioning. In this paper we study the feasibility of using disaggregated
architectures for intensive data applications, in contrast to the monolithic
approach of server-oriented architectures. Particularly, we have tested a
proactive network analysis system in which the workload demands are highly
variable. In the context of the dReDBox disaggregated architecture, the results
show that the overhead caused by using remote memory resources is significant,
between 66\% and 80\%, but we have also observed that the memory usage is one
order of magnitude higher for the stress case with respect to average
workloads. Therefore, dimensioning memory for the worst case in conventional
systems will result in a notable waste of resources. Finally, we found that,
for the selected use case, parallelism is limited by memory. Therefore, using a
disaggregated architecture will allow for increased parallelism, which, at the
same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper
will be presented during the IEEE International Conference on High
Performance Computing and Communications in Bangkok, Thailand. 18 - 20
December, 2017. To be published in the conference proceeding
On Designing Multicore-aware Simulators for Biological Systems
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques. These opportunities are developed in two
general families of solutions involving both the single simulation and a bulk
of independent simulations (either replicas of derived from parameter sweep).
Proposed solutions are tested on the parallelisation of the CWC simulator
(Calculus of Wrapped Compartments) that is carried out according to proposed
solutions by way of the FastFlow programming framework making possible fast
development and efficient execution on multi-cores.Comment: 19 pages + cover pag
Air pollution modelling using a graphics processing unit with CUDA
The Graphics Processing Unit (GPU) is a powerful tool for parallel computing.
In the past years the performance and capabilities of GPUs have increased, and
the Compute Unified Device Architecture (CUDA) - a parallel computing
architecture - has been developed by NVIDIA to utilize this performance in
general purpose computations. Here we show for the first time a possible
application of GPU for environmental studies serving as a basement for decision
making strategies. A stochastic Lagrangian particle model has been developed on
CUDA to estimate the transport and the transformation of the radionuclides from
a single point source during an accidental release. Our results show that
parallel implementation achieves typical acceleration values in the order of
80-120 times compared to CPU using a single-threaded implementation on a 2.33
GHz desktop computer. Only very small differences have been found between the
results obtained from GPU and CPU simulations, which are comparable with the
effect of stochastic transport phenomena in atmosphere. The relatively high
speedup with no additional costs to maintain this parallel architecture could
result in a wide usage of GPU for diversified environmental applications in the
near future.Comment: 5 figure
- âŠ