Search CORE

8,893 research outputs found

\u3cem\u3eHP-DAEMON\u3c/em\u3e: \u3cem\u3eH\u3c/em\u3eigh \u3cem\u3eP\u3c/em\u3eerformance \u3cem\u3eD\u3c/em\u3eistributed \u3cem\u3eA\u3c/em\u3edaptive \u3cem\u3eE\u3c/em\u3energy-efficient \u3cem\u3eM\u3c/em\u3eatrix-multiplicati\u3cem\u3eON\u3c/em\u3e

Author: Chen Longxiang
Chen Zizhong
Ge Rong
Li Dong
Tan Li
Zong Ziliang
Publication venue: e-Publications@Marquette
Publication date: 01/01/2014
Field of study

The demands of improving energy efficiency for high performance scientific applications arise crucially nowadays. Software-controlled hardware solutions directed by Dynamic Voltage and Frequency Scaling (DVFS) have shown their effectiveness extensively. Although DVFS is beneficial to green computing, introducing DVFS itself can incur non-negligible overhead, if there exist a large number of frequency switches issued by DVFS. In this paper, we propose a strategy to achieve the optimal energy savings for distributed matrix multiplication via algorithmically trading more computation and communication at a time adaptively with user-specified memory costs for less DVFS switches, which saves 7.5% more energy on average than a classic strategy. Moreover, we leverage a high performance communication scheme for fully exploiting network bandwidth via pipeline broadcast. Overall, the integrated approach achieves substantial energy savings (up to 51.4%) and performance gain (28.6% on average) compared to ScaLAPACK pdgemm() on a cluster with an Ethernet switch, and outperforms ScaLAPACK and DPLASMA pdgemm() respectively by 33.3% and 32.7% on average on a cluster with an Infiniband switch

epublications@Marquette

Algebraic Methods in the Congested Clique

Author: Aho Alfred V.
Björklund Andreas
Czumaj Artur
Furman M. E.
Holzer Stephan
James
Nesetril Jaroslav
Tiskin Alexandre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an

O(n^{1-2/\omega})

round matrix multiplication algorithm, where

\omega < 2.3728639

is the exponent of matrix multiplication. In conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight results include: -- triangle and 4-cycle counting in

O(n^{0.158})

rounds, improving upon the

O(n^{1/3})

triangle detection algorithm of Dolev et al. [DISC 2012], -- a

(1 + o(1))

-approximation of all-pairs shortest paths in

O(n^{0.158})

rounds, improving upon the

\tilde{O} (n^{1/2})

-round

(2 + o(1))

-approximation algorithm of Nanongkai [STOC 2014], and -- computing the girth in

O(n^{0.158})

rounds, which is the first non-trivial solution in this model. In addition, we present a novel constant-round combinatorial algorithm for detecting 4-cycles.Comment: This is work is a merger of arxiv:1412.2109 and arxiv:1412.266

arXiv.org e-Print Archive

Crossref

MPG.PuRe

A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

Author: Andersen
Andreoni
Angelopoulos
Bachelet
Ballone
Brocks
Brommer
Brommer
Car
Car
Car
Clarke
Gupta
Hannes Jónsson
Hohenberg
Hohl
Hohl
Hoover
James Wiggs
King-Smith
Kleinman
Kohn
Littlefield
Marinescu
Nelson
Nosé
Payne
Ryckaert
Troullier
Wiggs
Williams
Štich
Štich
Štich
Publication venue: 'Elsevier BV'
Publication date: 14/11/1994
Field of study

We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.Comment: Accepted by Computer Physics Communications, latex, 34 pages without figures, 15 figures available in PostScript form via WWW at http://www-theory.chem.washington.edu/~wiggs/hyb_figures.htm

arXiv.org e-Print Archive

Crossref