857 research outputs found

    ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

    Full text link
    Solving the electronic structure from a generalized or standard eigenproblem is often the bottleneck in large scale calculations based on Kohn-Sham density-functional theory. This problem must be addressed by essentially all current electronic structure codes, based on similar matrix expressions, and by high-performance computation. We here present a unified software interface, ELSI, to access different strategies that address the Kohn-Sham eigenvalue problem. Currently supported algorithms include the dense generalized eigensolver library ELPA, the orbital minimization method implemented in libOMM, and the pole expansion and selected inversion (PEXSI) approach with lower computational complexity for semilocal density functionals. The ELSI interface aims to simplify the implementation and optimal use of the different strategies, by offering (a) a unified software framework designed for the electronic structure solvers in Kohn-Sham density-functional theory; (b) reasonable default parameters for a chosen solver; (c) automatic conversion between input and internal working matrix formats, and in the future (d) recommendation of the optimal solver depending on the specific problem. Comparative benchmarks are shown for system sizes up to 11,520 atoms (172,800 basis functions) on distributed memory supercomputing architectures.Comment: 55 pages, 14 figures, 2 table

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    pPython Performance Study

    Full text link
    pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.Comment: arXiv admin note: substantial text overlap with arXiv:2208.1490

    A Preemption-Based Meta-Scheduling System for Distributed Computing

    Get PDF
    This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments. In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers. Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution. The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application
    corecore