41,502 research outputs found
Scalable and fast heterogeneous molecular simulation with predictive parallelization schemes
Multiscale and inhomogeneous molecular systems are challenging topics in the
field of molecular simulation. In particular, modeling biological systems in
the context of multiscale simulations and exploring material properties are
driving a permanent development of new simulation methods and optimization
algorithms. In computational terms, those methods require parallelization
schemes that make a productive use of computational resources for each
simulation and from its genesis. Here, we introduce the heterogeneous domain
decomposition approach which is a combination of an heterogeneity sensitive
spatial domain decomposition with an \textit{a priori} rearrangement of
subdomain-walls. Within this approach, the theoretical modeling and
scaling-laws for the force computation time are proposed and studied as a
function of the number of particles and the spatial resolution ratio. We also
show the new approach capabilities, by comparing it to both static domain
decomposition algorithms and dynamic load balancing schemes. Specifically, two
representative molecular systems have been simulated and compared to the
heterogeneous domain decomposition proposed in this work. These two systems
comprise an adaptive resolution simulation of a biomolecule solvated in water
and a phase separated binary Lennard-Jones fluid.Comment: 14 pages, 12 figure
SKIRT: hybrid parallelization of radiative transfer simulations
We describe the design, implementation and performance of the new hybrid
parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which
has been used extensively for modeling the continuum radiation of dusty
astrophysical systems including late-type galaxies and dusty tori. The hybrid
scheme combines distributed memory parallelization, using the standard Message
Passing Interface (MPI) to communicate between processes, and shared memory
parallelization, providing multiple execution threads within each process to
avoid duplication of data structures. The synchronization between multiple
threads is accomplished through atomic operations without high-level locking
(also called lock-free programming). This improves the scaling behavior of the
code and substantially simplifies the implementation of the hybrid scheme. The
result is an extremely flexible solution that adjusts to the number of
available nodes, processors and memory, and consequently performs well on a
wide variety of computing architectures.Comment: 21 pages, 20 figure
WavePacket: A Matlab package for numerical quantum dynamics. II: Open quantum systems, optimal control, and model reduction
WavePacket is an open-source program package for numeric simulations in
quantum dynamics. It can solve time-independent or time-dependent linear
Schr\"odinger and Liouville-von Neumann-equations in one or more dimensions.
Also coupled equations can be treated, which allows, e.g., to simulate
molecular quantum dynamics beyond the Born-Oppenheimer approximation.
Optionally accounting for the interaction with external electric fields within
the semi-classical dipole approximation, WavePacket can be used to simulate
experiments involving tailored light pulses in photo-induced physics or
chemistry. Being highly versatile and offering visualization of quantum
dynamics 'on the fly', WavePacket is well suited for teaching or research
projects in atomic, molecular and optical physics as well as in physical or
theoretical chemistry. Building on the previous Part I which dealt with closed
quantum systems and discrete variable representations, the present Part II
focuses on the dynamics of open quantum systems, with Lindblad operators
modeling dissipation and dephasing. This part also describes the WavePacket
function for optimal control of quantum dynamics, building on rapid
monotonically convergent iteration methods. Furthermore, two different
approaches to dimension reduction implemented in WavePacket are documented
here. In the first one, a balancing transformation based on the concepts of
controllability and observability Gramians is used to identify states that are
neither well controllable nor well observable. Those states are either
truncated or averaged out. In the other approach, the H2-error for a given
reduced dimensionality is minimized by H2 optimal model reduction techniques,
utilizing a bilinear iterative rational Krylov algorithm
Coalition Formation and Combinatorial Auctions; Applications to Self-organization and Self-management in Utility Computing
In this paper we propose a two-stage protocol for resource management in a
hierarchically organized cloud. The first stage exploits spatial locality for
the formation of coalitions of supply agents; the second stage, a combinatorial
auction, is based on a modified proxy-based clock algorithm and has two phases,
a clock phase and a proxy phase. The clock phase supports price discovery; in
the second phase a proxy conducts multiple rounds of a combinatorial auction
for the package of services requested by each client. The protocol strikes a
balance between low-cost services for cloud clients and a decent profit for the
service providers. We also report the results of an empirical investigation of
the combinatorial auction stage of the protocol.Comment: 14 page
CONCISE: Compressed 'n' Composable Integer Set
Bit arrays, or bitmaps, are used to significantly speed up set operations in
several areas, such as data warehousing, information retrieval, and data
mining, to cite a few. However, bitmaps usually use a large storage space, thus
requiring compression. Nevertheless, there is a space-time tradeoff among
compression schemes. The Word Aligned Hybrid (WAH) bitmap compression trades
some space to allow for bitwise operations without first decompressing bitmaps.
WAH has been recognized as the most efficient scheme in terms of computation
time. In this paper we present CONCISE (Compressed 'n' Composable Integer Set),
a new scheme that enjoys significatively better performances than those of WAH.
In particular, when compared to WAH, our algorithm is able to reduce the
required memory up to 50%, by having similar or better performance in terms of
computation time. Further, we show that CONCISE can be efficiently used to
manipulate bitmaps representing sets of integral numbers in lieu of well-known
data structures such as arrays, lists, hashtables, and self-balancing binary
search trees. Extensive experiments over synthetic data show the effectiveness
of our approach.Comment: Preprint submitted to Information Processing Letters, 7 page
Towards Optimal Distributed Node Scheduling in a Multihop Wireless Network through Local Voting
In a multihop wireless network, it is crucial but challenging to schedule
transmissions in an efficient and fair manner. In this paper, a novel
distributed node scheduling algorithm, called Local Voting, is proposed. This
algorithm tries to semi-equalize the load (defined as the ratio of the queue
length over the number of allocated slots) through slot reallocation based on
local information exchange. The algorithm stems from the finding that the
shortest delivery time or delay is obtained when the load is semi-equalized
throughout the network. In addition, we prove that, with Local Voting, the
network system converges asymptotically towards the optimal scheduling.
Moreover, through extensive simulations, the performance of Local Voting is
further investigated in comparison with several representative scheduling
algorithms from the literature. Simulation results show that the proposed
algorithm achieves better performance than the other distributed algorithms in
terms of average delay, maximum delay, and fairness. Despite being distributed,
the performance of Local Voting is also found to be very close to a centralized
algorithm that is deemed to have the optimal performance
A Multi-Code Analysis Toolkit for Astrophysical Simulation Data
The analysis of complex multiphysics astrophysical simulations presents a
unique and rapidly growing set of challenges: reproducibility, parallelization,
and vast increases in data size and complexity chief among them. In order to
meet these challenges, and in order to open up new avenues for collaboration
between users of multiple simulation platforms, we present yt (available at
http://yt.enzotools.org/), an open source, community-developed astrophysical
analysis and visualization toolkit. Analysis and visualization with yt are
oriented around physically relevant quantities rather than quantities native to
astrophysical simulation codes. While originally designed for handling Enzo's
structure adaptive mesh refinement (AMR) data, yt has been extended to work
with several different simulation methods and simulation codes including Orion,
RAMSES, and FLASH. We report on its methods for reading, handling, and
visualizing data, including projections, multivariate volume rendering,
multi-dimensional histograms, halo finding, light cone generation and
topologically-connected isocontour identification. Furthermore, we discuss the
underlying algorithms yt uses for processing and visualizing data, and its
mechanisms for parallelization of analysis tasks.Comment: 18 pages, 6 figures, emulateapj format. Resubmitted to Astrophysical
Journal Supplement Series with revisions from referee. yt can be found at
http://yt.enzotools.org
An efficient MPI/OpenMP parallelization of the Hartree-Fock method for the second generation of Intel Xeon Phi processor
Modern OpenMP threading techniques are used to convert the MPI-only
Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two
separate implementations that differ by the sharing or replication of key data
structures among threads are considered, density and Fock matrices. All
implementations are benchmarked on a super-computer of 3,000 Intel Xeon Phi
processors. With 64 cores per processor, scaling numbers are reported on up to
192,000 cores. The hybrid MPI/OpenMP implementation reduces the memory
footprint by approximately 200 times compared to the legacy code. The
MPI/OpenMP code was shown to run up to six times faster than the original for a
range of molecular system sizes.Comment: SC17 conference paper, 12 pages, 7 figure
- …