Search CORE

45 research outputs found

Parallel algorithms for two processors precedence constraint scheduling

Author: Serna Iglesias María José
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

The final publication is available at link.springer.comPeer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Block Minorization--Maximization Algorithm for Heteroscedastic Regression

Author: Lloyd-Jones Luke R.
McLachlan Geoffrey J.
Nguyen Hien D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2016
Field of study

The computation of the maximum likelihood (ML) estimator for heteroscedastic regression models is considered. The traditional Newton algorithms for the problem require matrix multiplications and inversions, which are bottlenecks in modern Big Data contexts. A new Big Data-appropriate minorization--maximization (MM) algorithm is considered for the computation of the ML estimator. The MM algorithm is proved to generate monotonically increasing sequences of likelihood values and to be convergent to a stationary point of the log-likelihood function. A distributed and parallel implementation of the MM algorithm is presented and the MM algorithm is shown to have differing time complexity to the Newton algorithm. Simulation studies demonstrate that the MM algorithm improves upon the computation time of the Newton algorithm in some practical scenarios where the number of observations is large

arXiv.org e-Print Archive

University of Queensland eSpace

FooPar: A Functional Object Oriented Parallel Framework in Scala

Author: A Grama
A Grama
E Dekel
F Darema
GL Taboada
K Hwang
M Odersky
MJ Quinn
R Loogen
SJ Thompson
V Kumar
Publication venue
Publication date: 13/06/2013
Field of study

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines these features with parallel computing techniques. FooPar is designed modular and supports easy access to different communication backends for distributed memory architectures as well as high performance math libraries. In this article we use it to parallelize matrix matrix multiplication and show its scalability by a isoefficiency analysis. In addition, results based on a empirical analysis on two supercomputers are given. We achieve close-to-optimal performance wrt. theoretical peak performance. Based on this result we conclude that FooPar allows to fully access Scala's design features without suffering from performance drops when compared to implementations purely based on C and MPI

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI

Author: Hutter Juerg
Lazzaro Alfio
Schuett Ole
VandeVondele Joost
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2017
Field of study

Matrix-matrix multiplication is a basic operation in linear algebra and an essential building block for a wide range of algorithms in various scientific fields. Theory and implementation for the dense, square matrix case are well-developed. If matrices are sparse, with application-specific sparsity patterns, the optimal implementation remains an open question. Here, we explore the performance of communication reducing 2.5D algorithms and one-sided MPI communication in the context of linear scaling electronic structure theory. In particular, we extend the DBCSR sparse matrix library, which is the basic building block for linear scaling electronic structure theory and low scaling correlated methods in CP2K. The library is specifically designed to efficiently perform block-sparse matrix-matrix multiplication of matrices with a relatively large occupation. Here, we compare the performance of the original implementation based on Cannon's algorithm and MPI point-to-point communication, with an implementation based on MPI one-sided communications (RMA), in both a 2D and a 2.5D approach. The 2.5D approach trades memory and auxiliary operations for reduced communication, which can lead to a speedup if communication is dominant. The 2.5D algorithm is somewhat easier to implement with one-sided communications. A detailed description of the implementation is provided, also for non ideal processor topologies, since this is important for actual applications. Given the importance of the precise sparsity pattern, and even the actual matrix data, which decides the effective fill-in upon multiplication, the tests are performed within the CP2K package with application benchmarks. Results show a substantial boost in performance for the RMA based 2.5D algorithm, up to 1.80x, which is observed to increase with the number of involved processes in the parallelization.Comment: In Proceedings of PASC '17, Lugano, Switzerland, June 26-28, 2017, 10 pages, 4 figure

arXiv.org e-Print Archive

Crossref

ZORA

Parallel algorithms for boundary value problems

Author: Lin Avi
Publication venue
Publication date
Field of study

A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed

NASA Technical Reports Server

Distributed Evaluation of an Iterative Function for All Object Pairs on a SIMD Hypercube

Author: Erçal Fikret
Publication venue: Scholars\u27 Mine
Publication date: 01/01/1990
Field of study

An efficient distributed algorithm for evaluating an iterative function on all pairwise combinations of C objects on an SIMD hypercube is presented. The algorithm achieves uniform load distribution and minimal, completely local interprocessor communication

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Parallelizing algorithms in ada on clementina II : Face recognition system

Author: Champredonde Raúl
Chichizola Franco
Publication venue
Publication date: 01/10/2001
Field of study

In the Laboratory of Research and Development on Computer Science of the National University of La Plata, a face recognition system has been developed. This article describes a series of testings based on parallel processing, with the objective of optimizing the said system response times developed in Ada programming language on SGI Origin 2000 parallel architecture known as Clementina II. Then, the results obtained are analyzedEje: Programación concurrenteRed de Universidades con Carreras en Informática (RedUNCI

Singular value decomposition on SIMD hypercube and shuffle-exchange computers

Author: Chuang Henry Y.H.
Pan Yi
Publication venue: Published by Elsevier Ltd.
Publication date: 31/08/1992
Field of study

AbstractThis paper reports several parallel singular value decomposition (SVD) algorithms on the hypercube and shuffle-exchange SIMD computers. Unlike previously published hypercube SVD algorithms which map a column pair of a matrix onto a processor, the algorithms presented in this paper map a matrix column pair onto a column of processors. In this way, a further reduction in time complexity is achieved. The paper also introduces the concept of two-dimensional shuffle-exchange networks, and corresponding SVD algorithms for one-dimensional and two-dimensional shuffle-exchange computers are developed

Elsevier - Publisher Connector

Multinode broadcast in hypercubes and rings with randomly distributed length of packets

Author
Publication venue: Massachusetts Institute of Technology, Laboratory for Information and Decision Systems
Publication date: 01/01/1990
Field of study

Includes bibliographical references (p. 19-20).Cover title.Research supported by the NSF. NSF-DDM-8903385 Research supported by the ARO. DAAL03-b6-K-0171by Emmanouel A. Varvarigos and Dimitri P. Bertsekas

DSpace@MIT