61 research outputs found
Sub-matrix updates for the Continuous-Time Auxiliary Field algorithm
We present a sub-matrix update algorithm for the continuous-time auxiliary
field method that allows the simulation of large lattice and impurity problems.
The algorithm takes optimal advantage of modern CPU architectures by
consistently using matrix instead of vector operations, resulting in a speedup
of a factor of and thereby allowing access to larger systems and
lower temperature. We illustrate the power of our algorithm at the example of a
cluster dynamical mean field simulation of the N\'{e}el transition in the
three-dimensional Hubbard model, where we show momentum dependent self-energies
for clusters with up to 100 sites
Monte Carlo simulations of ordering in ferromagnetic-antiferromagnetic bilayers
Monte Carlo simulations have been used to study phase transitions on coupled
anisotropic ferro/antiferromagnetic (FM/AFM) films of classical Heisenberg
spins. We consider films of different thicknesses, with fully compensated
exchange across the FM/AFM interface. We find indications of a phase transition
on each film, occuring at different temperatures. It appears that both
transition temperatures depend on the film thickness.Comment: Revtex, 4 pages, 4 figure
GT4Py: High Performance Stencils for Weather and Climate Applications using Python
All major weather and climate applications are currently developed using
languages such as Fortran or C++. This is typical in the domain of high
performance computing (HPC), where efficient execution is an important concern.
Unfortunately, this approach leads to implementations that intermix
optimizations for specific hardware architectures with the high-level numerical
methods that are typical for the domain. This leads to code that is verbose,
difficult to extend and maintain, and difficult to port to different hardware
architectures. Here, we propose a different strategy based on GT4Py (GridTools
for Python). GT4Py is a Python framework to write weather and climate
applications that includes a high-level embedded domain specific language (DSL)
to write stencil computations. The toolchain integrated in GT4Py enables
automatic code-generation,to obtain the performance of state-of-the-art C++ and
CUDA implementations. The separation of concerns between the mathematical
definitions and the actual implementations allows for performance portability
of the computations on a wide range of computing architectures, while being
embedded in Python allows easy access to the tools of the Python ecosystem to
enhance the productivity of the scientists and facilitate integration in
complex workflows. Here, the initial release of GT4Py is described, providing
an overview of the current state of the framework and performance results
showing how GT4Py can outperform pure Python implementations by orders of
magnitude.Comment: 12 page
- …