Search CORE

12,024 research outputs found

Distributed Computing in the Asynchronous LOCAL model

Author: A Castañeda
B Awerbuch
D Peleg
D Peleg
J Suomela
Joel Rybicki
M Naor
MJ Fischer
N Linial
R Cole
Publication venue
Publication date: 22/10/2019
Field of study

The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps, and are failure-free. We show that relaxing these hypotheses to some extent does not hurt local computing. In particular, we show that, for any construction task

T

associated to a locally checkable labeling (LCL), if

T

is solvable in

t

rounds in the LOCAL model, then

T

remains solvable in

O(t)

rounds in the asynchronous LOCAL model. This improves the result by Casta\~neda et al. [SSS 2016], which was restricted to 3-coloring the rings. More generally, the main contribution of this paper is to show that, perhaps surprisingly, asynchrony and failures in the computations do not restrict the power of the LOCAL model, as long as the communications remain synchronous and failure-free

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

Author: Ghysels Pieter
Li Xiaoye S.
Napov Artem
Rouet Francois-Henry
Williams Samuel
Publication venue
Publication date: 25/02/2015
Field of study

We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

DI-fusion

Efficient Algorithms for Scheduling Moldable Tasks

Author: Loiseau Patrick
Wu Xiaohu
Publication venue
Publication date: 19/11/2021
Field of study

We study the problem of scheduling

n

independent moldable tasks on

m

processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a

(\frac{3}{2}+\epsilon)

-approximation algorithm for makespan minimization with a complexity linear in

n

and polynomial in

\log{m}

and

\frac{1}{\epsilon}

where

\epsilon

is arbitrarily small. We propose a new perspective of the existing speedup models: the speedup of a task

T_{j}

is linear when the number

p

of assigned processors is small (up to a threshold

\delta_{j}

) while it presents monotonicity when

p

ranges in

[\delta_{j}, k_{j}]

; the bound

k_{j}

indicates an unacceptable overhead when parallelizing on too many processors. For a given integer

\delta\geq 5

, let

u=\left\lceil \sqrt[2]{\delta} \right\rceil-1

. In this paper, we propose a

\frac{1}{\theta(\delta)} (1+\epsilon)

-approximation algorithm for makespan minimization with a complexity

\mathcal{O}(n\log{\frac{n}{\epsilon}}\log{m})

where

\theta(\delta) = \frac{u+1}{u+2}\left( 1- \frac{k}{m} \right)

(

m\gg k

). As a by-product, we also propose a

\theta(\delta)

-approximation algorithm for throughput maximization with a common deadline with a complexity

\mathcal{O}(n^{2}\log{m})

arXiv.org e-Print Archive

Towards a Queueing-Based Framework for In-Network Function Computation

Author: Banerjee Siddhartha
Gupta Piyush
Shakkottai Sanjay
Publication venue
Publication date: 01/01/2012
Field of study

We seek to develop network algorithms for function computation in sensor networks. Specifically, we want dynamic joint aggregation, routing, and scheduling algorithms that have analytically provable performance benefits due to in-network computation as compared to simple data forwarding. To this end, we define a class of functions, the Fully-Multiplexible functions, which includes several functions such as parity, MAX, and k th -order statistics. For such functions we exactly characterize the maximum achievable refresh rate of the network in terms of an underlying graph primitive, the min-mincut. In acyclic wireline networks, we show that the maximum refresh rate is achievable by a simple algorithm that is dynamic, distributed, and only dependent on local information. In the case of wireless networks, we provide a MaxWeight-like algorithm with dynamic flow splitting, which is shown to be throughput-optimal

arXiv.org e-Print Archive

CiteSeerX

Scalable Task-Based Algorithm for Multiplication of Block-Rank-Sparse Matrices

Author: Baruch E.
Cannon L. E.
Choi J
Choi J.
Choi J.
Solomonik E.
Szabo A.
van de Geijn R. A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/10/2015
Field of study

A task-based formulation of Scalable Universal Matrix Multiplication Algorithm (SUMMA), a popular algorithm for matrix multiplication (MM), is applied to the multiplication of hierarchy-free, rank-structured matrices that appear in the domain of quantum chemistry (QC). The novel features of our formulation are: (1) concurrent scheduling of multiple SUMMA iterations, and (2) fine-grained task-based composition. These features make it tolerant of the load imbalance due to the irregular matrix structure and eliminate all artifactual sources of global synchronization.Scalability of iterative computation of square-root inverse of block-rank-sparse QC matrices is demonstrated; for full-rank (dense) matrices the performance of our SUMMA formulation usually exceeds that of the state-of-the-art dense MM implementations (ScaLAPACK and Cyclops Tensor Framework).Comment: 8 pages, 6 figures, accepted to IA3 2015. arXiv admin note: text overlap with arXiv:1504.0504

arXiv.org e-Print Archive

Crossref

Datacenter Traffic Control: Understanding Techniques and Trade-offs

Author: Noormohammadpour Mohammad
Raghavendra Cauligi S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2017
Field of study

Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

Crossref

OSF Preprints

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

FigShare