15 research outputs found

    OPTIMIZING PERFORMANCE ON MASSIVELY PARALLEL COMPUTERS USING A REMOTE MEMORY ACCESS PROGRAMMING MODEL

    No full text
    Parallel programming models are of paramount importance because they affect both the performance delivered by massively parallel systems and the productivity of the programmer seeking that performance. Advancements in networks, multicore chips, and related technology continue to improve the efficiency of modern supercomputers. However, the average application efficiency is a small fraction of the peak system efficiency.This research proposes techniques for optimizing application performance on supercomputers using remote memory access (RMA)parallel programming model. The growing gaps between CPU-network and CPU-memory timescales are fundamental problems that require attention in the design of communication models as well as scalable parallel algorithms. This research validates the RMA model because of its simplicity, its good hardware support on modern networks, and its posession of certain characteristics important for reducing the performance gap between system peak and application performance.The effectiveness of these optimizations is evaluated in the contextof parallel linear algebra kernels. The current approach differs fromthe other linear algebra algorithms by the explicit use of sharedmemory and remote memory access communication rather than message passing. It is suitable for clusters and scalable shared memorysystems. The experimental results on large scale systems(Linux-Infiniband cluster, Cray XT) demonstrate consistent performanceadvantages over the ScaLAPACK suite, the leading implementation ofparallel linear algebra algorithms used today

    SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

    No full text
    This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon’s algorithm. It is suitable for clusters and scalable shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over pdgemm from the ScaLAPACK/PBBLAS suite, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI Altix, the new algorithm performs 20 times better than pdgemm for a matrix size of 1000 on 128 processors. The impact of zero-copy nonblocking RMA communications and shared memory communication on matrix multiplication performance on clusters are investigated. 1

    Exploiting non-blocking remote memory access communication in scientific benchmarks

    No full text
    Abstract. This paper describes a comparative performance study of MPI and Remote Memory Access (RMA) communication models in context of four scientific benchmarks: NAS MG, NAS CG, SUMMA matrix multiplication, and Lennard Jones molecular dynamics on clusters with the Myrinet network. It is shown that RMA communication delivers a consistent performance advantage over MPI. In some cases an improvement as much as 50 % was achieved. Benefits of using non-blocking RMA for overlapping computation and communication are discussed.

    Optimizing Performance on Linux Clusters Using Advanced Communication Protocols: How 10+ . . .

    No full text
    Advancements in high-performance networks (Quadrics, Infiniband or Myrinet) continue to improve the efficiency of modern clusters. However, the average application efficiency is as small fraction of the peak as the system 's efficiency. This paper describes techniques for optimizing application performance on Linux clusters using Remote Memory Access communication protocols. The effectiveness of these optimizations is presented in the context of an application kernel, dense matrix multiplication. The result was achieving over 10 teraflops on HP Linux cluster on which LINPACK performance is measured as 8.6 teraflops

    Spatial Prediction of Soil Micronutrients using Supervised Self-Organizing Maps

    No full text
    Supplementing optimal quantities of nutrients in soil is considered to be one of the challenging tasks faced by farmers mainly because of the lack of information on the status of soil nutrients. Quantitative estimation of soil elements across agricultural farms is a hard problem to address. Although there are several studies proposed to spatially predict the soil nutrients employing geostatistical, computational, or by using AI techniques, most of these methods either do not have sufficient accuracy or perform well only with datasets similar to the model building dataset. In this study, we propose supervised Self-Organizing Maps (xyf-SOM) for the first time to quantitatively and spatially predict soil micronutrients viz. Boron, Iron, Manganese, Copper and Zinc. Soil nutrient data (2594 samples) pertaining to Alappuzha District, Kerala, India, collected during 2019–20 was used for the study. Geo-environmental predictors generated from remote sensing data such as topography, vegetation, land surface temperature, and precipitation were used as explanatory variables. The prediction accuracy was compared with Regression Kriging and Random Forest based spatial prediction. The results showed that supervised Self-organizing Maps predictions resulted significantly high and consistent prediction accuracy for all micronutrients when compared with Geostatistical and random forest predictions. The models were validated with test data set as well as with an independent dataset. The prediction model was applied to a data grid with a 200 x 200-m spatial interval, and the prediction results were converted and visualized in a geospatial framework

    A High-Performance Event Service for HPC Applications

    No full text
    Event services based on publish-subscribe architectures are well established components of distributed computing applications. Recently, an event service has been proposed as part of the Common Component Architecture (CCA) for high-performance computing applications. In this paper we describe our experiences investigating implementation options for the CCA event service that exploit interprocess communications mechanisms commonly used on HPC platforms. The aim of our work is to create an event service that supports the well-known software engineering advantages of publish-subscribe architectures, and provides performance levels approaching those achievable using more primitive message-passing mechanisms such as MPI. 1

    Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global Arrays

    No full text
    The development of complex scientific applications for high-end systems is a challenging task. Addressing complexity of the involved software and algorithms is becoming increasingly difficult and requires appropriate software engineering approaches to address interoperability, maintenance, and software composition challenges. At the same time, the requirements for performance and scalability to thousand processor configurations magnifies the level of difficulties facing the scientific programmer due to the variable levels of parallelism available in different algorithms or functional modules of the application. This paper demonstrates how the Common Component Architecture (CCA) and Global Arrays (GA) can be used in context of computational chemistry to express and manage multi-level parallelism through the use of processor groups. For example, the numerical Hessian calculation using three levels of parallelism in NWChem computational chemistry package outperformed the original version of the NWChem code based on single level parallelism by a factor of 90 % when running on 256 processors

    Molecular Docking and Dynamics Simulation Study of Telomerase Inhibitors as Potential Anti-Cancer Agents

    No full text
    Normal cells’ genomic identity is protected by telomeres and sometimes chromosomal instability was observed due to shortening of telomerase because of successive cell divisions. Reports indicate that telomerase length is crucial in determining telomerase activity which in turn leads to cancer initiation. It is reported that telomere length regulation has been identified as a plausible strategy for cancer diagnostics and treatment. In the present MS, we explored the telomerase inhibitory activity of catechin analogues and it’s oligomers using computational methods. The structural properties of different ligands discussed in the MS were computed using density functional theory. Conformational effect of different chromene subunit such as 2R, 3R conformations were explored using computational methods. The stereochemical contributions to receptor binding such as intra ligand π-interactions of these ligands were also investigated. We herein propose that these stereochemical aspects of catechins and their oligomers as the most vital factor deciding the effective binding with the N-terminal domain of telomerase which is an efficient strategy in cancer therapy. </p
    corecore