308 research outputs found

    Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study

    Full text link
    Analytic, first-principles performance modeling of distributed-memory applications is difficult due to a wide spectrum of random disturbances caused by the application and the system. These disturbances (commonly called "noise") destroy the assumptions of regularity that one usually employs when constructing simple analytic models. Despite numerous efforts to quantify, categorize, and reduce such effects, a comprehensive quantitative understanding of their performance impact is not available, especially for long delays that have global consequences for the parallel application. In this work, we investigate various traces collected from synthetic benchmarks that mimic real applications on simulated and real message-passing systems in order to pinpoint the mechanisms behind delay propagation. We analyze the dependence of the propagation speed of idle waves emanating from injected delays with respect to the execution and communication properties of the application, study how such delays decay under increased noise levels, and how they interact with each other. We also show how fine-grained noise can make a system immune against the adverse effects of propagating idle waves. Our results contribute to a better understanding of the collective phenomena that manifest themselves in distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change

    Spectral properties of the 2D Holstein t-J model

    Get PDF
    Employing the Lanczos algorithm in combination with a kernel polynomial moment expansion (KPM) and the maximum entropy method (MEM), we show a way of calculating charge and spin excitations in the Holstein t-J model, including the full quantum nature of phonons. To analyze polaron band formation we evaluate the hole spectral function for a wide range of electron-phonon coupling strengths. For the first time, we present results for the optical conductivity of the 2D Holstein t-J model.Comment: 2 pages, Latex. Submitted to Physica C, Proc. Int. Conf. on M2HTSC

    Polaronic effects in strongly coupled electron-phonon systems: Exact diagonalization results for the 2D Holstein t-J model

    Full text link
    Ground-state and dynamical properties of the 2D Holstein t-J model are examined by means of direct Lanczos diagonalization, using a truncation method of the phononic Hilbert space. The single-hole spectral function shows the formation of a narrow hole-polaron band as the electron-phonon coupling increases, where the polaronic band collapse is favoured by strong Coulomb correlations. In the two-hole sector, the hole-hole correlations unambiguously indicate the existence of inter-site bipolaronic states. At quarter-filling, a polaronic superlattice is formed in the adiabatic strong-coupling regime.Comment: 3 pages, LaTeX, 6 Postscript figures, Proc. Int. Conf. on Strongly Correlated Electron Systems, Zuerich, August 1996, accepted for publication in Physica

    On the stability of polaronic superlattices in strongly coupled electron-phonon systems

    Full text link
    We investigate the interplay of electron-phonon (EP) coupling and strong electronic correlations in the frame of the two-dimensional (2D) Holstein t-J model (HtJM), focusing on polaronic ordering phenomena for the quarter-filled band case. The use of direct Lanczos diagonalization on finite lattices allows us to include the effects of quantum phonon fluctuations in the calculation of spin/charge structure factors and hole-phonon correlation functions. In the adiabatic strong coupling regime we found evidence for ``self-localization'' of polaronic carriers in a (π,π)(\pi,\pi) charge-modulated structure, a type of superlattice solidification reminiscent of those observed in the nickel perovskites La2−xSrxNiO4+yLa_{2-x}Sr_{x}NiO_{4+y}.Comment: 2 pages, Latex. Submitted to Physica C, Proc. Int. Conf. on M2HTSC

    Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

    Full text link
    New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment.Comment: 9 pages, 6 figure

    LIKWID: Lightweight Performance Tools

    Full text link
    Exploiting the performance of today's microprocessors requires intimate knowledge of the microarchitecture as well as an awareness of the ever-growing complexity in thread and cache topology. LIKWID is a set of command line utilities that addresses four key problems: Probing the thread and cache topology of a shared-memory node, enforcing thread-core affinity on a program, measuring performance counter metrics, and microbenchmarking for reliable upper performance bounds. Moreover, it includes a mpirun wrapper allowing for portable thread-core affinity in MPI and hybrid MPI/threaded applications. To demonstrate the capabilities of the tool set we show the influence of thread affinity on performance using the well-known OpenMP STREAM triad benchmark, use hardware counter tools to study the performance of a stencil code, and finally show how to detect bandwidth problems on ccNUMA-based compute nodes.Comment: 12 page

    Spectral properties of the 2D Holstein polaron

    Full text link
    The two-dimensional Holstein model is studied by means of direct Lanczos diagonalization preserving the full dynamics and quantum nature of phonons. We present numerical exact results for the single-particle spectral function, the polaronic quasiparticle weight, and the optical conductivity. The polaron band dispersion is derived both from exact diagonalization of small lattices and analytic calculation of the polaron self-energy.Comment: 8 pages, revtex, 6 figure

    Parallelization Strategies for Density Matrix Renormalization Group Algorithms on Shared-Memory Systems

    Full text link
    Shared-memory parallelization (SMP) strategies for density matrix renormalization group (DMRG) algorithms enable the treatment of complex systems in solid state physics. We present two different approaches by which parallelization of the standard DMRG algorithm can be accomplished in an efficient way. The methods are illustrated with DMRG calculations of the two-dimensional Hubbard model and the one-dimensional Holstein-Hubbard model on contemporary SMP architectures. The parallelized code shows good scalability up to at least eight processors and allows us to solve problems which exceed the capability of sequential DMRG calculations.Comment: 18 pages, 9 figure
    • …
    corecore