Search CORE

233 research outputs found

Performance Improvements of Common Sparse Numerical Linear Algebra Computations

Author: Luszczek Piotr Rafal
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/01/2003
Field of study

Manufacturers of computer hardware are able to continuously sustain an unprecedented pace of progress in computing speed of their products, partially due to increased clock rates but also because of ever more complicated chip designs. With new processor families appearing every few years, it is increasingly harder to achieve high performance rates in sparse matrix computations. This research proposes new methods for sparse matrix factorizations and applies in an iterative code generalizations of known concepts from related disciplines. The proposed solutions and extensions are implemented in ways that tend to deliver efficiency while retaining ease of use of existing solutions. The implementations are thoroughly timed and analyzed using a commonly accepted set of test matrices. The tests were conducted on modern processors that seem to have gained an appreciable level of popularity and are fairly representative for a wider range of processor types that are available on the market now or in the near future. The new factorization technique formally introduced in the early chapters is later on proven to be quite competitive with state of the art software currently available. Although not totally superior in all cases (as probably no single approach could possibly be), the new factorization algorithm exhibits a few promising features. In addition, an all-embracing optimization effort is applied to an iterative algorithm that stands out for its robustness. This also gives satisfactory results on the tested computing platforms in terms of performance improvement. The same set of test matrices is used to enable an easy comparison between both investigated techniques, even though they are customarily treated separately in the literature. Possible extensions of the presented work are discussed. They range from easily conceivable merging with existing solutions to rather more evolved schemes dependent on hard to predict progress in theoretical and algorithmic research

University of Tennessee, Knoxville: Trace

CiteSeerX

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement

Author: Anzt H.
Dongarra J.
Heuveline Vincent
Luszczek P.
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

In hardware-aware high performance computing, block-asynchronous iteration and mixed precision iterative refinement are two techniques that may be used to leverage the computing power of SIMD accelerators like GPUs in the iterative solution of linear equation systems. although they use a very different approach for this purpose, they share the basic idea of compensating the convergence properties of an inferior numerical algorithm by a more efficient usage of the provided computing power. In this paper, we analyze the potential of combining both techniques. Therefore, we derive a mixed precision iterative refinement algorithm using a block-asynchronous iteration as an error correction solver, and compare its performance with a pure implementation of a block-asynchronous iteration and an iterative refinement method using double precision for the error correction solver. For matrices from the University of Florida Matrix collection, we report the convergence behaviour and provide the total solver runtime using different GPU architectures

KITopen

Benthic Biomonitoring in Arctic Tundra Streams: A Community-Based Approach in Iqaluit, Nunavut, Canada

Author: Luszczek C.E.
Medeiros A.S.
Quinlan R.
Shirley J.
Publication venue: 'The Arctic Institute of North America'
Publication date: 09/03/2011
Field of study

Recent residential, commercial, and industrial development in the catchments of several Arctic streams has heightened the need to assess these freshwater systems accurately. It was imperative to develop methods that would be both effective at judging ecological condition of tundra streams and suitable for use by local groups. An investigation of two streams influenced by urbanization in Iqaluit, Nunavut, was carried out between July and August each year in 2007 – 09. Simple summary metrics (e.g., Shannon Index) and multivariate analysis (DCA, RD A) both demonstrated biological impairment in the benthic community at site locations downstream of urbanized portions of a local stream. This impairment was characterized by a loss of diversity and a dramatic shift of the benthic community to one dominated by chironomids from the subfamily Orthocladiinae. Elevated levels of total nitrogen (TN) and total phosphorus (TP) and several metals (Zn, Sr, Rb, Al, Co, Fe) were also found to be significantly related to benthic assemblages within these disturbed areas. This investigation also addressed taxonomic sufficiency, indicating that while family-level taxonomic identifications were sensitive enough to differentiate between pristine and impacted stream sites, a more precise taxonomic identification of the dominant benthos taxa (Insecta: Diptera: Chironomidae) to sub-family/tribe level identified a significant shift towards pollution-tolerant taxa. This higher taxonomic resolution will allow for the adaptation of protocols and the use of simple summary metrics to be effective for a community-based biomonitoring program in Arctic tundra streams.De récents développements résidentiels, commerciaux et industriels dans les bassins versants de plusieurs cours d’eau de l’Arctique ont intensifié la nécessité de bien évaluer ces systèmes d’eau douce. Il était impératif de mettre au point des méthodes qui permettraient de juger des conditions écologiques des cours d’eau de la toundra et qui seraient utilisables par divers groupes de la région. Entre juillet et août des années 2007 à 2009, une enquête a été effectuée sur deux cours d’eau influencés par l’urbanisation à Iqaluit, au Nunavut. De simples mesures sommaires (indice de Shannon par exemple) et une analyse à variables multiples (DCA, RDA) ont permis de démontrer la dégradation biologique de la communauté benthique à divers lieux du site, en aval de segments urbanisés d’un cours d’eau local. Cette dégradation était caractérisée par une perte de diversité et un changement dramatique de la communauté benthique qui est maintenant dominée par des chironomidés de la sous-famille Orthocladiinae. Nous avons également constaté que les taux élevés d’azote total (AT), de phosphore total (PT) et de plusieurs métaux (Zn, Sr, Rb, Al, Co, Fe) étaient fortement liés aux assemblages benthiques faisant partie de ces zones perturbées. Cette enquête a également porté sur la suffisance taxonomique, ce qui a laissé croire que bien que les identifications taxonomiques au niveau de la famille étaient assez sensibles pour différencier entre les sites de cours d’eau vierges et les sites perturbés, une identification taxonomique plus précise allant des taxons benthiques dominants (Insecta:Diptera: Chironomidae) jusqu’au niveau de la sous-famille et de la tribu ont permis d’identifier un virage important vers des taxons tolérants à la pollution. Cette résolution taxonomique supérieure permettra l’adaptation de protocoles et l’utilisation de simples mesures sommaires efficaces en vue de l’établissement d’un programme de biosurveillance communautaire dans les cours d’eau de la toundra de l’Arctique

University of Calgary Journal Hosting

Regulatory Immunotherapy in Bone Marrow Transplantation

Author: Luszczek Wioleta
Morales-Tirado Vanessa
Pillai Asha
van der Merwe Marié
Publication venue: TheScientificWorldJOURNAL
Publication date: 01/01/2011
Field of study

Every year individuals receive hematopoietic stem cell transplantation (HSCT) to eradicate malignant and nonmalignant disease. The immunobiology of allotransplantation is an area of ongoing discovery, from the recipient's conditioning treatment prior to the transplant to the donor cell populations responsible for engraftment, graft-versus-host disease, and graft-versus-tumor effect. In this review, we focus on donor-type immunoregulatory T cells, namely, natural killer T cells (NKT) and regulatory T cells (Treg), and their current and potential roles in tolerance induction after allogeneic HSCT

University of Memphis Digital Commons

Crossref

Directory of Open Access Journals

PubMed Central

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs

Author: Dongarra Jack
Kurzak Jakub
Luszczek Piotr
Mary Théo
Tomov Stanimire
Yamazaki Ichitaro
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/11/2015
Field of study

International audienceA low-rank approximation of a dense matrix plays an important role in many applications. To compute such an approximation , a common approach uses the QR factorization with column pivoting (QRCP). Though the reliability and efficiency of QRCP have been demonstrated, this determin-istic approach requires costly communication at each step of the factorization. Since such communication is becoming increasingly expensive on modern computers, an alternative approach based on random sampling, which can be implemented using communication-optimal kernels, is becoming attractive. To study its potential, in this paper, we compare the performance of random sampling with that of QRCP on an NVIDIA Kepler GPU. Our performance results demonstrate that random sampling can be up to 12.8× faster than the deterministic approach for computing the approximation of the same accuracy. We also present the parallel scaling of the random sampling over multiple GPUs on a single compute node, showing a speedup of 3.8× over three Kepler GPUs. These results demonstrate the potential of the random sampling as an excellent computational tool for many applications, and its potential is likely to grow on the emerging computers with the increasing communication costs

Crossref

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

Author: Dongarra Jack
Gates Mark
Haidar Azzam
Jia Yulu
Kabir Khairul
Luszczek Piotr
Tomov Stanimire
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA

Crossref

Directory of Open Access Journals

The University of Manchester - Institutional Repository

LU Factorization with Partial Pivoting for a Multicore System with Accelerators

Author: J. Dongarra
J. Kurzak
M. Faverge
P. Luszczek
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Proposed Consistent Exception Handling for the BLAS and LAPACK

Author: Demmel James
Dongarra Jack
Gates Mark
Henry Greg
Langou Julien
Li Xiaoye
Luszczek Piotr
Pereira Weslley
Riedy Jason
Rubio-González Cindy
Publication venue
Publication date: 19/07/2022
Field of study

Numerical exceptions, which may be caused by overflow, operations like division by 0 or sqrt(-1), or convergence failures, are unavoidable in many cases, in particular when software is used on unforeseen and difficult inputs. As more aspects of society become automated, e.g., self-driving cars, health monitors, and cyber-physical systems more generally, it is becoming increasingly important to design software that is resilient to exceptions, and that responds to them in a consistent way. Consistency is needed to allow users to build higher-level software that is also resilient and consistent (and so on recursively). In this paper we explore the design space of consistent exception handling for the widely used BLAS and LAPACK linear algebra libraries, pointing out a variety of instances of inconsistent exception handling in the current versions, and propose a new design that balances consistency, complexity, ease of use, and performance. Some compromises are needed, because there are preexisting inconsistencies that are outside our control, including in or between existing vendor BLAS implementations, different programming languages, and even compilers for the same programming language. And user requests from our surveys are quite diverse. We also propose our design as a possible model for other numerical software, and welcome comments on our design choices

arXiv.org e-Print Archive