12,011 research outputs found

    Hydra: A Parallel Adaptive Grid Code

    Full text link
    We describe the first parallel implementation of an adaptive particle-particle, particle-mesh code with smoothed particle hydrodynamics. Parallelisation of the serial code, ``Hydra'', is achieved by using CRAFT, a Cray proprietary language which allows rapid implementation of a serial code on a parallel machine by allowing global addressing of distributed memory. The collisionless variant of the code has already completed several 16.8 million particle cosmological simulations on a 128 processor Cray T3D whilst the full hydrodynamic code has completed several 4.2 million particle combined gas and dark matter runs. The efficiency of the code now allows parameter-space explorations to be performed routinely using 64364^3 particles of each species. A complete run including gas cooling, from high redshift to the present epoch requires approximately 10 hours on 64 processors. In this paper we present implementation details and results of the performance and scalability of the CRAFT version of Hydra under varying degrees of particle clustering.Comment: 23 pages, LaTex plus encapsulated figure

    A Parallel Adaptive P3M code with Hierarchical Particle Reordering

    Full text link
    We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

    LUNES: Agent-based Simulation of P2P Systems (Extended Version)

    Full text link
    We present LUNES, an agent-based Large Unstructured NEtwork Simulator, which allows to simulate complex networks composed of a high number of nodes. LUNES is modular, since it splits the three phases of network topology creation, protocol simulation and performance evaluation. This permits to easily integrate external software tools into the main software architecture. The simulation of the interaction protocols among network nodes is performed via a simulation middleware that supports both the sequential and the parallel/distributed simulation approaches. In the latter case, a specific mechanism for the communication overhead-reduction is used; this guarantees high levels of performance and scalability. To demonstrate the efficiency of LUNES, we test the simulator with gossip protocols executed on top of networks (representing peer-to-peer overlays), generated with different topologies. Results demonstrate the effectiveness of the proposed approach.Comment: Proceedings of the International Workshop on Modeling and Simulation of Peer-to-Peer Architectures and Systems (MOSPAS 2011). As part of the 2011 International Conference on High Performance Computing and Simulation (HPCS 2011

    Performance comparison of clustered and replicated information retrieval systems

    Get PDF
    The amount of information available over the Internet is increasing daily as well as the importance and magnitude of Web search engines. Systems based on a single centralised index present several problems (such as lack of scalability), which lead to the use of distributed information retrieval systems to effectively search for and locate the required information. A distributed retrieval system can be clustered and/or replicated. In this paper, using simulations, we present a detailed performance analysis, both in terms of throughput and response time, of a clustered system compared to a replicated system. In addition, we consider the effect of changes in the query topics over time. We show that the performance obtained for a clustered system does not improve the performance obtained by the best replicated system. Indeed, the main advantage of a clustered system is the reduction of network traffic. However, the use of a switched network eliminates the bottleneck in the network, markedly improving the performance of the replicated systems. Moreover, we illustrate the negative performance effect of the changes over time in the query topics when a distributed clustered system is used. On the contrary, the performance of a distributed replicated system is query independent

    Energy-aware Load Balancing Policies for the Cloud Ecosystem

    Full text link
    The energy consumption of computer and communication systems does not scale linearly with the workload. A system uses a significant amount of energy even when idle or lightly loaded. A widely reported solution to resource management in large data centers is to concentrate the load on a subset of servers and, whenever possible, switch the rest of the servers to one of the possible sleep states. We propose a reformulation of the traditional concept of load balancing aiming to optimize the energy consumption of a large-scale system: {\it distribute the workload evenly to the smallest set of servers operating at an optimal energy level, while observing QoS constraints, such as the response time.} Our model applies to clustered systems; the model also requires that the demand for system resources to increase at a bounded rate in each reallocation interval. In this paper we report the VM migration costs for application scaling.Comment: 10 Page

    The Simulation Model Partitioning Problem: an Adaptive Solution Based on Self-Clustering (Extended Version)

    Full text link
    This paper is about partitioning in parallel and distributed simulation. That means decomposing the simulation model into a numberof components and to properly allocate them on the execution units. An adaptive solution based on self-clustering, that considers both communication reduction and computational load-balancing, is proposed. The implementation of the proposed mechanism is tested using a simulation model that is challenging both in terms of structure and dynamicity. Various configurations of the simulation model and the execution environment have been considered. The obtained performance results are analyzed using a reference cost model. The results demonstrate that the proposed approach is promising and that it can reduce the simulation execution time in both parallel and distributed architectures
    • …
    corecore