Search CORE

45 research outputs found

Multicore Architecture-aware Scientific Applications

Author: Srinivasa Avinash
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a large scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times

Digital Repository @ Iowa State University (ISU)

Crossref

UNT Digital Library

Multicore Architecture-aware Scientific Applications

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Performance analysis and middleware assisted adaptation for quantum chemistry computations

Author: Seshagiri Lakshminarasimhan
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

Quantum chemistry applications such as General Atomic and Molecular Electronic Structure System(GAMESS) that can execute on a complex peta-scale parallel computing environment have a large number of input parameters that affect the overall performance. The application characteristics vary according to the input parameters. This is due to the difference in the usage of resources like network bandwidth, I/O and main memory according to the input parameters. Effective execution of applications in a parallel computing environment that share such resources require some sort of adaptive mechanism to enable efficient usage of these resources. The adaptation adjusts the most computationally intensive part of the application thus leading to sizable gains. General Atomic and Molecular Electronic Structure System (GAMESS), used for ab-initio molecular quantum chemistry calculations, utilizes NICAN (Network Information Conveyer and Application Notification) for dynamically making adaptations so as to improve the application performance in heavy load conditions. The adaptation mechanism has the ability to modify the application execution in a very simplistic yet effective manner. In this work, we have explored methods to expand the structure of NICAN in order to include other input parameters based on which the application performance can be controlled. The application performance has been analyzed on different architectures to otain fine grained performance data and a tuning strategy has been identified. A generic database framework has been incorporated in the existing NICAN mechanism

Digital Repository @ Iowa State University (ISU)

Non-uniform Memory Affinity Strategy in Multi-Threaded Sparse Matrix Computations

Author: Sosonkina Masha
Srivinasa Avinash
Publication venue: Iowa State University Digital Repository
Publication date: 01/09/2011
Field of study

As the core counts on modern multi-processor systems increase, so does the memory contention with all the processes/threads trying to access the main memory simultaneously. This is typical of UMA (Uniform Memory Access) architectures with a single physical memory bank leading to poor scalability in multi-threaded applications. To palliate this problem, modern systems are moving increasingly towards Non-Uniform Memory Access (NUMA) architectures, in which the physical memory is split into several (typically two or four) banks. Each memory bank is associated with a set of cores enabling threads to operate from their own physical memory banks while retaining the concept of a shared virtual address space. However, accessing shared data structures from the remote memory banks may become increasingly slow. This paper proposes a way to determine and pin certain parts of the shared data to specific memory banks, thus minimizing remote accesses. To achieve this, the existing application code has be supplied with the proposed interface to set-up and distribute the shared data appropriately among memory banks. Experiments with NAS benchmark as well as with a realistic large-scale application calculating ab-initio nuclear structure have been performed. Speedups of up to 3.5 times were observed with the proposed approach compared with the default memory placement policy

Digital Repository @ Iowa State University (ISU)

adaptations in electronic structure calculations in heterogeneous environments

Author: Talamudupula Sai Kiran
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Modern quantum chemistry deals with electronic structure calculations of unprecedented complexity and accuracy. They demand full power of high-performance computing and must be in tune with the given architecture for superior efficiency. To make such applications resource-aware, it is desirable to enable their static and dynamic adaptations using some external software (middleware), which may monitor both system availability and application needs, rather than mix science with system-related calls inside the application. The present work investigates scientific application interlinking with middleware based on the example of the computational chemistry package GAMESS and middleware NICAN. The existing synchronous model is limited by the possible delays due to the middleware processing time under the sustainable runtime system conditions. Proposed asynchronous and hybrid models aim at overcoming this limitation. When linked with NICAN, the fragment molecular orbital (FMO) method is capable of adapting statically and dynamically its fragment scheduling policy based on the computing platform conditions. Significant execution time and throughput gains have been obtained due to such static adaptations when the compute nodes have very different core counts. Dynamic adaptations are based on the main memory availability at run time. NICAN prompts FMO to postpone scheduling certain fragments, if there is not enough memory for their immediate execution. Hence, FMO may be able to complete the calculations whereas without such adaptations it aborts

Digital Repository @ Iowa State University (ISU)

Crossref

UNT Digital Library

Argonne Leadership Computing Facility 2011 annual report : Shaping future supercomputing.

Author: Coffey R.
Drugan C. (LCF)
Messina P.
Papka M.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 16/08/2012
Field of study

The ALCF's Early Science Program aims to prepare key applications for the architecture and scale of Mira and to solidify libraries and infrastructure that will pave the way for other future production applications. Two billion core-hours have been allocated to 16 Early Science projects on Mira. The projects, in addition to promising delivery of exciting new science, are all based on state-of-the-art, petascale, parallel applications. The project teams, in collaboration with ALCF staff and IBM, have undertaken intensive efforts to adapt their software to take advantage of Mira's Blue Gene/Q architecture, which, in a number of ways, is a precursor to future high-performance-computing architecture. The Argonne Leadership Computing Facility (ALCF) enables transformative science that solves some of the most difficult challenges in biology, chemistry, energy, climate, materials, physics, and other scientific realms. Users partnering with ALCF staff have reached research milestones previously unattainable, due to the ALCF's world-class supercomputing resources and expertise in computation science. In 2011, the ALCF's commitment to providing outstanding science and leadership-class resources was honored with several prestigious awards. Research on multiscale brain blood flow simulations was named a Gordon Bell Prize finalist. Intrepid, the ALCF's BG/P system, ranked No. 1 on the Graph 500 list for the second consecutive year. The next-generation BG/Q prototype again topped the Green500 list. Skilled experts at the ALCF enable researchers to conduct breakthrough science on the Blue Gene system in key ways. The Catalyst Team matches project PIs with experienced computational scientists to maximize and accelerate research in their specific scientific domains. The Performance Engineering Team facilitates the effective use of applications on the Blue Gene system by assessing and improving the algorithms used by applications and the techniques used to implement those algorithms. The Data Analytics and Visualization Team lends expertise in tools and methods for high-performance, post-processing of large datasets, interactive data exploration, batch visualization, and production visualization. The Operations Team ensures that system hardware and software work reliably and optimally; system tools are matched to the unique system architectures and scale of ALCF resources; the entire system software stack works smoothly together; and I/O performance issues, bug fixes, and requests for system software are addressed. The User Services and Outreach Team offers frontline services and support to existing and potential ALCF users. The team also provides marketing and outreach to users, DOE, and the broader community

Crossref

UNT Digital Library