12,011 research outputs found
Hydra: A Parallel Adaptive Grid Code
We describe the first parallel implementation of an adaptive
particle-particle, particle-mesh code with smoothed particle hydrodynamics.
Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a
Cray proprietary language which allows rapid implementation of a serial code on
a parallel machine by allowing global addressing of distributed memory.
The collisionless variant of the code has already completed several 16.8
million particle cosmological simulations on a 128 processor Cray T3D whilst
the full hydrodynamic code has completed several 4.2 million particle combined
gas and dark matter runs. The efficiency of the code now allows parameter-space
explorations to be performed routinely using particles of each species.
A complete run including gas cooling, from high redshift to the present epoch
requires approximately 10 hours on 64 processors.
In this paper we present implementation details and results of the
performance and scalability of the CRAFT version of Hydra under varying degrees
of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure
A Parallel Adaptive P3M code with Hierarchical Particle Reordering
We discuss the design and implementation of HYDRA_OMP a parallel
implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M)
code HYDRA. The code is designed primarily for conducting cosmological
hydrodynamic simulations and is written in Fortran77+OpenMP. A number of
optimizations for RISC processors and SMP-NUMA architectures have been
implemented, the most important optimization being hierarchical reordering of
particles within chaining cells, which greatly improves data locality thereby
removing the cache misses typically associated with linked lists. Parallel
scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes
for a variety of modern SMP architectures. We give performance data in terms of
the number of particle updates per second, which is a more useful performance
metric than raw MFlops. A basic version of the code will be made available to
the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics
Communication
LUNES: Agent-based Simulation of P2P Systems (Extended Version)
We present LUNES, an agent-based Large Unstructured NEtwork Simulator, which
allows to simulate complex networks composed of a high number of nodes. LUNES
is modular, since it splits the three phases of network topology creation,
protocol simulation and performance evaluation. This permits to easily
integrate external software tools into the main software architecture. The
simulation of the interaction protocols among network nodes is performed via a
simulation middleware that supports both the sequential and the
parallel/distributed simulation approaches. In the latter case, a specific
mechanism for the communication overhead-reduction is used; this guarantees
high levels of performance and scalability. To demonstrate the efficiency of
LUNES, we test the simulator with gossip protocols executed on top of networks
(representing peer-to-peer overlays), generated with different topologies.
Results demonstrate the effectiveness of the proposed approach.Comment: Proceedings of the International Workshop on Modeling and Simulation
of Peer-to-Peer Architectures and Systems (MOSPAS 2011). As part of the 2011
International Conference on High Performance Computing and Simulation (HPCS
2011
Performance comparison of clustered and replicated information retrieval systems
The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent
Energy-aware Load Balancing Policies for the Cloud Ecosystem
The energy consumption of computer and communication systems does not scale
linearly with the workload. A system uses a significant amount of energy even
when idle or lightly loaded. A widely reported solution to resource management
in large data centers is to concentrate the load on a subset of servers and,
whenever possible, switch the rest of the servers to one of the possible sleep
states. We propose a reformulation of the traditional concept of load balancing
aiming to optimize the energy consumption of a large-scale system: {\it
distribute the workload evenly to the smallest set of servers operating at an
optimal energy level, while observing QoS constraints, such as the response
time.} Our model applies to clustered systems; the model also requires that the
demand for system resources to increase at a bounded rate in each reallocation
interval. In this paper we report the VM migration costs for application
scaling.Comment: 10 Page
The Simulation Model Partitioning Problem: an Adaptive Solution Based on Self-Clustering (Extended Version)
This paper is about partitioning in parallel and distributed simulation. That
means decomposing the simulation model into a numberof components and to
properly allocate them on the execution units. An adaptive solution based on
self-clustering, that considers both communication reduction and computational
load-balancing, is proposed. The implementation of the proposed mechanism is
tested using a simulation model that is challenging both in terms of structure
and dynamicity. Various configurations of the simulation model and the
execution environment have been considered. The obtained performance results
are analyzed using a reference cost model. The results demonstrate that the
proposed approach is promising and that it can reduce the simulation execution
time in both parallel and distributed architectures
- …