41,502 research outputs found

    Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes

    Full text link
    Multiscale and inhomogeneous molecular systems are challenging topics in the field of molecular simulation. In particular, modeling biological systems in the context of multiscale simulations and exploring material properties are driving a permanent development of new simulation methods and optimization algorithms. In computational terms, those methods require parallelization schemes that make a productive use of computational resources for each simulation and from its genesis. Here, we introduce the heterogeneous domain decomposition approach which is a combination of an heterogeneity sensitive spatial domain decomposition with an \textit{a priori} rearrangement of subdomain-walls. Within this approach, the theoretical modeling and scaling-laws for the force computation time are proposed and studied as a function of the number of particles and the spatial resolution ratio. We also show the new approach capabilities, by comparing it to both static domain decomposition algorithms and dynamic load balancing schemes. Specifically, two representative molecular systems have been simulated and compared to the heterogeneous domain decomposition proposed in this work. These two systems comprise an adaptive resolution simulation of a biomolecule solvated in water and a phase separated binary Lennard-Jones fluid.Comment: 14 pages, 12 figure

    SKIRT: hybrid parallelization of radiative transfer simulations

    Full text link
    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modeling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behavior of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.Comment: 21 pages, 20 figure

    WavePacket: A Matlab package for numerical quantum dynamics. II: Open quantum systems, optimal control, and model reduction

    Full text link
    WavePacket is an open-source program package for numeric simulations in quantum dynamics. It can solve time-independent or time-dependent linear Schr\"odinger and Liouville-von Neumann-equations in one or more dimensions. Also coupled equations can be treated, which allows, e.g., to simulate molecular quantum dynamics beyond the Born-Oppenheimer approximation. Optionally accounting for the interaction with external electric fields within the semi-classical dipole approximation, WavePacket can be used to simulate experiments involving tailored light pulses in photo-induced physics or chemistry. Being highly versatile and offering visualization of quantum dynamics 'on the fly', WavePacket is well suited for teaching or research projects in atomic, molecular and optical physics as well as in physical or theoretical chemistry. Building on the previous Part I which dealt with closed quantum systems and discrete variable representations, the present Part II focuses on the dynamics of open quantum systems, with Lindblad operators modeling dissipation and dephasing. This part also describes the WavePacket function for optimal control of quantum dynamics, building on rapid monotonically convergent iteration methods. Furthermore, two different approaches to dimension reduction implemented in WavePacket are documented here. In the first one, a balancing transformation based on the concepts of controllability and observability Gramians is used to identify states that are neither well controllable nor well observable. Those states are either truncated or averaged out. In the other approach, the H2-error for a given reduced dimensionality is minimized by H2 optimal model reduction techniques, utilizing a bilinear iterative rational Krylov algorithm

    Coalition Formation and Combinatorial Auctions; Applications to Self-organization and Self-management in Utility Computing

    Full text link
    In this paper we propose a two-stage protocol for resource management in a hierarchically organized cloud. The first stage exploits spatial locality for the formation of coalitions of supply agents; the second stage, a combinatorial auction, is based on a modified proxy-based clock algorithm and has two phases, a clock phase and a proxy phase. The clock phase supports price discovery; in the second phase a proxy conducts multiple rounds of a combinatorial auction for the package of services requested by each client. The protocol strikes a balance between low-cost services for cloud clients and a decent profit for the service providers. We also report the results of an empirical investigation of the combinatorial auction stage of the protocol.Comment: 14 page

    CONCISE: Compressed 'n' Composable Integer Set

    Full text link
    Bit arrays, or bitmaps, are used to significantly speed up set operations in several areas, such as data warehousing, information retrieval, and data mining, to cite a few. However, bitmaps usually use a large storage space, thus requiring compression. Nevertheless, there is a space-time tradeoff among compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades some space to allow for bitwise operations without first decompressing bitmaps. WAH has been recognized as the most efficient scheme in terms of computation time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set), a new scheme that enjoys significatively better performances than those of WAH. In particular, when compared to WAH, our algorithm is able to reduce the required memory up to 50%, by having similar or better performance in terms of computation time. Further, we show that CONCISE can be efficiently used to manipulate bitmaps representing sets of integral numbers in lieu of well-known data structures such as arrays, lists, hashtables, and self-balancing binary search trees. Extensive experiments over synthetic data show the effectiveness of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page

    Towards Optimal Distributed Node Scheduling in a Multihop Wireless Network through Local Voting

    Full text link
    In a multihop wireless network, it is crucial but challenging to schedule transmissions in an efficient and fair manner. In this paper, a novel distributed node scheduling algorithm, called Local Voting, is proposed. This algorithm tries to semi-equalize the load (defined as the ratio of the queue length over the number of allocated slots) through slot reallocation based on local information exchange. The algorithm stems from the finding that the shortest delivery time or delay is obtained when the load is semi-equalized throughout the network. In addition, we prove that, with Local Voting, the network system converges asymptotically towards the optimal scheduling. Moreover, through extensive simulations, the performance of Local Voting is further investigated in comparison with several representative scheduling algorithms from the literature. Simulation results show that the proposed algorithm achieves better performance than the other distributed algorithms in terms of average delay, maximum delay, and fairness. Despite being distributed, the performance of Local Voting is also found to be very close to a centralized algorithm that is deemed to have the optimal performance

    A Multi-Code Analysis Toolkit for Astrophysical Simulation Data

    Full text link
    The analysis of complex multiphysics astrophysical simulations presents a unique and rapidly growing set of challenges: reproducibility, parallelization, and vast increases in data size and complexity chief among them. In order to meet these challenges, and in order to open up new avenues for collaboration between users of multiple simulation platforms, we present yt (available at http://yt.enzotools.org/), an open source, community-developed astrophysical analysis and visualization toolkit. Analysis and visualization with yt are oriented around physically relevant quantities rather than quantities native to astrophysical simulation codes. While originally designed for handling Enzo's structure adaptive mesh refinement (AMR) data, yt has been extended to work with several different simulation methods and simulation codes including Orion, RAMSES, and FLASH. We report on its methods for reading, handling, and visualizing data, including projections, multivariate volume rendering, multi-dimensional histograms, halo finding, light cone generation and topologically-connected isocontour identification. Furthermore, we discuss the underlying algorithms yt uses for processing and visualizing data, and its mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical Journal Supplement Series with revisions from referee. yt can be found at http://yt.enzotools.org

    An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor

    Full text link
    Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are considered, density and Fock matrices. All implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi processors. With 64 cores per processor, scaling numbers are reported on up to 192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory footprint by approximately 200 times compared to the legacy code. The MPI/OpenMP code was shown to run up to six times faster than the original for a range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure
    • …
    corecore