308 research outputs found
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
Analytic, first-principles performance modeling of distributed-memory
applications is difficult due to a wide spectrum of random disturbances caused
by the application and the system. These disturbances (commonly called "noise")
destroy the assumptions of regularity that one usually employs when
constructing simple analytic models. Despite numerous efforts to quantify,
categorize, and reduce such effects, a comprehensive quantitative understanding
of their performance impact is not available, especially for long delays that
have global consequences for the parallel application. In this work, we
investigate various traces collected from synthetic benchmarks that mimic real
applications on simulated and real message-passing systems in order to pinpoint
the mechanisms behind delay propagation. We analyze the dependence of the
propagation speed of idle waves emanating from injected delays with respect to
the execution and communication properties of the application, study how such
delays decay under increased noise levels, and how they interact with each
other. We also show how fine-grained noise can make a system immune against the
adverse effects of propagating idle waves. Our results contribute to a better
understanding of the collective phenomena that manifest themselves in
distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change
Spectral properties of the 2D Holstein t-J model
Employing the Lanczos algorithm in combination with a kernel polynomial
moment expansion (KPM) and the maximum entropy method (MEM), we show a way of
calculating charge and spin excitations in the Holstein t-J model, including
the full quantum nature of phonons. To analyze polaron band formation we
evaluate the hole spectral function for a wide range of electron-phonon
coupling strengths. For the first time, we present results for the optical
conductivity of the 2D Holstein t-J model.Comment: 2 pages, Latex. Submitted to Physica C, Proc. Int. Conf. on M2HTSC
Polaronic effects in strongly coupled electron-phonon systems: Exact diagonalization results for the 2D Holstein t-J model
Ground-state and dynamical properties of the 2D Holstein t-J model are
examined by means of direct Lanczos diagonalization, using a truncation method
of the phononic Hilbert space. The single-hole spectral function shows the
formation of a narrow hole-polaron band as the electron-phonon coupling
increases, where the polaronic band collapse is favoured by strong Coulomb
correlations. In the two-hole sector, the hole-hole correlations unambiguously
indicate the existence of inter-site bipolaronic states. At quarter-filling, a
polaronic superlattice is formed in the adiabatic strong-coupling regime.Comment: 3 pages, LaTeX, 6 Postscript figures, Proc. Int. Conf. on Strongly
Correlated Electron Systems, Zuerich, August 1996, accepted for publication
in Physica
On the stability of polaronic superlattices in strongly coupled electron-phonon systems
We investigate the interplay of electron-phonon (EP) coupling and strong
electronic correlations in the frame of the two-dimensional (2D) Holstein t-J
model (HtJM), focusing on polaronic ordering phenomena for the quarter-filled
band case. The use of direct Lanczos diagonalization on finite lattices allows
us to include the effects of quantum phonon fluctuations in the calculation of
spin/charge structure factors and hole-phonon correlation functions. In the
adiabatic strong coupling regime we found evidence for ``self-localization'' of
polaronic carriers in a charge-modulated structure, a type of
superlattice solidification reminiscent of those observed in the nickel
perovskites .Comment: 2 pages, Latex. Submitted to Physica C, Proc. Int. Conf. on M2HTSC
Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory
New algorithms and optimization techniques are needed to balance the
accelerating trend towards bandwidth-starved multicore chips. It is well known
that the performance of stencil codes can be improved by temporal blocking,
lessening the pressure on the memory interface. We introduce a new pipelined
approach that makes explicit use of shared caches in multicore environments and
minimizes synchronization and boundary overhead. For clusters of shared-memory
nodes we demonstrate how temporal blocking can be employed successfully in a
hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure
LIKWID: Lightweight Performance Tools
Exploiting the performance of today's microprocessors requires intimate
knowledge of the microarchitecture as well as an awareness of the ever-growing
complexity in thread and cache topology. LIKWID is a set of command line
utilities that addresses four key problems: Probing the thread and cache
topology of a shared-memory node, enforcing thread-core affinity on a program,
measuring performance counter metrics, and microbenchmarking for reliable upper
performance bounds. Moreover, it includes a mpirun wrapper allowing for
portable thread-core affinity in MPI and hybrid MPI/threaded applications. To
demonstrate the capabilities of the tool set we show the influence of thread
affinity on performance using the well-known OpenMP STREAM triad benchmark, use
hardware counter tools to study the performance of a stencil code, and finally
show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page
Spectral properties of the 2D Holstein polaron
The two-dimensional Holstein model is studied by means of direct Lanczos
diagonalization preserving the full dynamics and quantum nature of phonons. We
present numerical exact results for the single-particle spectral function, the
polaronic quasiparticle weight, and the optical conductivity. The polaron band
dispersion is derived both from exact diagonalization of small lattices and
analytic calculation of the polaron self-energy.Comment: 8 pages, revtex, 6 figure
Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems
Shared-memory parallelization (SMP) strategies for density matrix
renormalization group (DMRG) algorithms enable the treatment of complex systems
in solid state physics. We present two different approaches by which
parallelization of the standard DMRG algorithm can be accomplished in an
efficient way. The methods are illustrated with DMRG calculations of the
two-dimensional Hubbard model and the one-dimensional Holstein-Hubbard model on
contemporary SMP architectures. The parallelized code shows good scalability up
to at least eight processors and allows us to solve problems which exceed the
capability of sequential DMRG calculations.Comment: 18 pages, 9 figure
- …