Search CORE

71 research outputs found

Load Balancing Regular Meshes on SMPS with MPI

Author: Kale Vivek
Publication venue
Publication date
Field of study

Domain decomposition for regular meshes on parallel computers has traditionally been performed by attempting to exactly partition the work among the available processors (now cores). However, these strategies often do not consider the inherent system noise which can hinder MPI application scalability to emerging peta-scale machines with 10000+ nodes. In this work, we suggest a solution that uses a tunable hybrid static/dynamic scheduling strategy that can be incorporated into current MPI implementations of mesh codes. By applying this strategy to a 3D jacobi algorithm, we achieve performance gains of at least 16% for 64 SMP nodes

Illinois Digital Environment for Access to Learning and Scholarship Repository

Hybrid static/dynamic scheduling for already optimized dense matrix factorization

Author: Donfack Simplice
Grigori Laura
Gropp William D.
Kale Vivek
Publication venue
Publication date: 08/10/2011
Field of study

We present the use of a hybrid static/dynamic scheduling strategy of the task dependency graph for direct methods used in dense numerical linear algebra. This strategy provides a balance of data locality, load balance, and low dequeue overhead. We show that the usage of this scheduling in communication avoiding dense factorization leads to significant performance gains. On a 48 core AMD Opteron NUMA machine, our experiments show that we can achieve up to 64% improvement over a version of CALU that uses fully dynamic scheduling, and up to 30% improvement over the version of CALU that uses fully static scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic scheduling approach is up to 8% faster than the version of CALU that uses a fully static scheduling or fully dynamic scheduling. Our algorithm leads to speedups over the corresponding routines for computing LU factorization in well known libraries. On the 48 core AMD NUMA machine, our best implementation is up to 110% faster than MKL, while on the 16 core Intel Xeon machine, it is up to 82% faster than MKL. Our approach also shows significant speedups compared with PLASMA on both of these systems

arXiv.org e-Print Archive

HAL-CentraleSupelec

Illinois Digital Environment for Access to Learning and Scholarship Repository

HAL-Rennes 1

Low-overhead scheduling for improving performance of scientific applications

Author: Kale Vivek
Publication venue
Publication date
Field of study

Application performance can degrade significantly due to node-local load imbalances during application execution on a large number of SMP nodes. These imbalances can arise from the machine, operating system, or the application itself. Although dynamic load balancing within a node can mitigate imbalances, such load balancing is challenging because of its impact to data movement and synchronization overhead. We developed a series of scheduling strategies that mitigate imbalances without incurring high overhead. Our strategies provide performance gains for various HPC codes, and perform better than widely known scheduling strategies such as OpenMP guided scheduling. Our developed scheme and methodology allows for scaling applications to next-generation clusters of SMPs with minimal application programmer intervention. We expect these techniques to be increasingly useful for future machines approaching exascale

Illinois Digital Environment for Access to Learning and Scholarship Repository

Recommended from our members

Performance Analysis of the Lattice Boltzmann Model Beyond Navier-Stokes

Author: Gropp William
Hammond Jeff
Kale Vivek
Kaxiras Efthimios
Randles Amanda Peters
Publication venue: Institute of Electrical and Electronic Engineers
Publication date: 05/09/2013
Field of study

The lattice Boltzmann method is increasingly important in facilitating large-scale ﬂuid dynamics simulations. To date, these simulations have been built on discretized velocity models of up to 27 neighbors. Recent work has shown that higher order approximations of the continuum Boltzmann equation enable not only recovery of the Navier-Stokes hydrodynamics, but also simulations for a wider range of Knudsen numbers, which is especially important in micro- and nanoscale ﬂows. These higher-order models have signiﬁcant impact on both the communication and computational complexity of the application. We present a performance study of the higherorder models as compared to the traditional ones, on both the IBM Blue Gene/P and Blue Gene/Q architectures. We study the tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of extended models and enable efﬁcient modeling of extreme regimes of computational ﬂuid dynamics.Physic

Harvard University - DASH

Limb reconstruction system as a primary and definitive mode of fixation in open fractures of long bones

Author: Argekar Harshad Ganesh
Goregaonkar Arvind Balkrishna
Kale Abhijit Bhimrao
Patole Vivek Vishwanath
Sharan Sudhir
Publication venue: 'Medip Academy'
Publication date: 22/02/2017
Field of study

Background: Management of open fractures of long bones by the traditional systems is very complex. Limb reconstruction system (LRS) was considered as very effective, and offers rigid stabilization of fracture fragments and with an easy access to soft tissue care. The aim of the study was to determine the efficacy of LRS for treatment of open fractures of long bones.Methods: This prospective study included 30 cases of both the sexes aged between 11-60 years. Patients with closed fractures of long bones and fractures treated conservatively were excluded from the study. Their clinical and radiological evaluation will be done at presentation and certain specific intervals and evaluated for signs of bone union and associated complications.Results: The mean age of the patients participated in the study was 35.6 years with male predominance (93.3%). All patients (100%) were injured by road traffic accidents. 50% of the cases were of Grade 2 type of fractures. The most common complication encountered was pin tract infections seen in 8 cases. We had good results in 24 patients, moderate in 5 and poor in 1 patient using modified Anderson and Hutchinson’s criteria. Conclusions: LRS is an alternative to the traditional system of fixation in the primary management of open fractures of long bones. It is less cumbersome to the patient and more patient friendly in terms of reducing financial burden also. It is a definitive single stage procedure.

Crossref

International Journal of Research in Orthopaedics

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory

Author: Balaji Pavan
Barrett Brian
Brightwell Ron
Buntinas Darius
Dinan James
Gropp William
Hoefler Torsten
Kale Vivek
Thakur Rajeev
Publication venue
Publication date: 18/06/2018
Field of study

Hybrid parallel programming with the message passing interface (MPI) for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil solve

Repository for Publications and Research Data

RERO DOC Digital Library

Recommended from our members

Cannot make do without you: Outsourcing by knowledge-intensive new firms in supplier networks

Author: Ahuja
Ajay Bhalla
Aldrich
Alston
Argyres
Arikan
Audretsch
Audretsch
Bagley
Baker
Barney
Baron
Bartel
Baum
Baum
Bhalla
Biotechnology Industry Organization (BIO)
Boje
Brettel
Chesbrough
Davidsson
Davies
Deeds
Dowling
Echols
Economist
Economist
Eisenhardt
Eisenhardt
Ellram
Ellram
Ernst
Ethiraj
Galambos
Gartner
Gibbert
Gilley
Glaser
Graebner
Gulati
Gulbrandsen
Gupta
Haeussler
Hallen
Helper
Hoang
Holcomb
Huckman
Hugo
Hui
Jacobides
Jefferies
Jensen
Johns
Kale
Katila
Kor
Kroes
Kumar
Lampel
Lin
Luo
Martens
Mayer
McGee
McGrath
McIvor
Michael
Mills
Modi
Nag
National Venture Capital Association (NVCA)
Ozcan
Partanen
Peteraf
Podolny
Poppo
Powell
Provan
Provan
Quinn
Ranganathan
Rao
Rothaermel
Saxton
Schreiner
Shane
Shepherd
Shervani
Siri Terjesen
Sirmon
Sirmon
Song
Song
Stel
Stinchcombe
Stuart
Stuart
Stump
Tapon
Terjesen
Tornikoski
Tucker
Uzzi
Vivek
Wagner
Wernerfelt
Williamson
Williamson
Wolter
Yin
Zhang
Zimmerman
Zott
Publication venue: 'Elsevier BV'
Publication date: 01/02/2013
Field of study

How do new firms operating in dynamic environments organize their operations? Building on the transaction cost theory and the resource based view and using case study data from ten biotechnology start-ups and twenty of their suppliers, this research reveals that new firms outsourcing to highly-embedded suppliers are likely to secure access to a wider supplier network, attain best-in-class operational knowledge, and avoid supplier opportunism while facing low levels of relationship-specific investments. New firms outsourcing to suppliers at the network periphery are more likely to realize cost efficiencies, expose themselves to opportunism, uncertainty, and higher levels of relationship-specific investments but low levels of operational knowledge. We propose that new firms build five outsourcing competencies to realize benefits

City Research Online

Crossref

Development and evaluation of introgression lines with yield enhancing genes of the Indian mega-variety of rice, MTU1010

Author: Aleena D.
Anila M.
Ayyappadass M.
Balachandran S. M.
Chaitra K.
Dilip Kumar T.
Fiyaz Abdul
Hajira S. K.
Harika G.
Jena Kshirod K.
Kale Ravindra
Kim Sung-Ryul
Kousik M. B. V. N.
Laxmi Prasanna B.
Mastanbee S. K.
Punniakotti E.
Rekha G.
Senguttuvel P.
Sinha Pragya
Sundaram R. M.
Swapnil K.
Vivek G.
Publication venue
Publication date: 06/11/2023
Field of study

MTU 1010 is an early maturing and high-yielding mega rice variety widely grown in an area of 3 Mha. It is characterised by limited grain number and panicle branching. To improve the grain number in MTU 1010, an IRRI breeding line, IR121055-2-10-5 was utilized as donor to transfer yield-enhancing genes Gn1a and OsSPL14 (associated with increased grain number and better panicle branching, respectively) into MTU1010 by Marker-Assisted Backcross Breeding (MABB). At each backcross generation, foreground selection was carried out with Gn1a and OsSPL14- specific molecular markers, whilst background selection was done with a set of SSR markers polymorphic between the IR121055-2-10-5 and MTU1010. With the use of a gene-specific marker, homozygous BC2 F2 plants carrying the yield-enhancing gene were identified and advanced through pedigree-method of selection till BC2 F6 and best performing ten lines were selected and evaluated in replicated station trials for yield contributing traits, where grain number and brancing per panicle exhibited high significant and positive correlation with single plant yield. Three promising lines namely RP6353-5-8-13-24, RP6353-26-13-39-5 and RP6353-32-12-8-16 with higher grain number and yield than MTU1010 were identified and nominated for evaluation in Initial Varietal Trial-Aerobic (IVT-Aerobic) of All India Crop Improvement Programme on Rice (AICRP), of which RP6353-26-13-39-5 (IET28674), was promoted for further testing

CGSpace