Search CORE

13,975 research outputs found

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

Author: Dinh David
Simhadri Harsha Vardhan
Tang Yuan
Publication venue
Publication date: 14/02/2016
Field of study

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "

\parallel

" (parallel) and "

;

" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "

\leadsto

" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is

O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right)

, where

Q^{*}

is the cache complexity of task

{\mathsf t}

C_i

is the cost of cache miss at level-

i

cache which is of size

M_i

\sigma\in(0,1)

is a constant, and

p

is the number of processors in an

h

-level cache hierarchy

arXiv.org e-Print Archive

Crossref

An efficient genetic algorithm for large-scale transmit power control of dense and robust wireless networks in harsh industrial environments

Author: De Pessemier Toon
Gong Xu
Joseph Wout
Martens Luc
Plets David
Tanghe Emmeric
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The industrial wireless local area network (IWLAN) is increasingly dense, due to not only the penetration of wireless applications to shop floors and warehouses, but also the rising need of redundancy for robust wireless coverage. Instead of simply powering on all access points (APs), there is an unavoidable need to dynamically control the transmit power of APs on a large scale, in order to minimize interference and adapt the coverage to the latest shadowing effects of dominant obstacles in an industrial indoor environment. To fulfill this need, this paper formulates a transmit power control (TPC) model that enables both powering on/off APs and transmit power calibration of each AP that is powered on. This TPC model uses an empirical one-slope path loss model considering three-dimensional obstacle shadowing effects, to enable accurate yet simple coverage prediction. An efficient genetic algorithm (GA), named GATPC, is designed to solve this TPC model even on a large scale. To this end, it leverages repair mechanism-based population initialization, crossover and mutation, parallelism as well as dedicated speedup measures. The GATPC was experimentally validated in a small-scale IWLAN that is deployed a real industrial indoor environment. It was further numerically demonstrated and benchmarked on both small- and large-scales, regarding the effectiveness and the scalability of TPC. Moreover, sensitivity analysis was performed to reveal the produced interference and the qualification rate of GATPC in function of varying target coverage percentage as well as number and placement direction of dominant obstacles. (C) 2018 Elsevier B.V. All rights reserved

Ghent University Academic Bibliography

Recommended from our members

Improving parallel program performance using critical path analysis

Author: Bic Lubomir
Gajski Daniel D.
Kwan Andrew W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

A programming tool that performs analysis of critical paths for parallel programs has been developed. This tool determines the critical path for the program as scheduled onto a parallel computer with P processing elements, the critical path for the program expressed as a data flow graph (when maximal parallelism can be expressed), and the minimum number of processing elements (P_opt) needed to obtain maximum program speedup. Experiments were performed using several versions of a Gaussian elimination program to examine how speedup varied with changes in granularity and critical path length. These experiments showed that when the available numer of processing elements P < P_opt, increasing granularity improved program speedup more than reducing (the data flow graph's) critical path length, whereas when P ≥ P_opt, increasing granularity degraded program speedup while reducing critical path length improved program speedup

eScholarship - University of California

A Parallel Distributed Strategy for Arraying a Scattered Robot Swarm

Author: Fekete Sandor P.
Hemmer Michael
Krupke Dominik
McLurkin James
Zhou Yu
Publication venue
Publication date: 12/05/2015
Field of study

We consider the problem of organizing a scattered group of

n

robots in two-dimensional space, with geometric maximum distance

D

between robots. The communication graph of the swarm is connected, but there is no central authority for organizing it. We want to arrange them into a sorted and equally-spaced array between the robots with lowest and highest label, while maintaining a connected communication network. In this paper, we describe a distributed method to accomplish these goals, without using central control, while also keeping time, travel distance and communication cost at a minimum. We proceed in a number of stages (leader election, initial path construction, subtree contraction, geometric straightening, and distributed sorting), none of which requires a central authority, but still accomplishes best possible parallelization. The overall arraying is performed in

O(n)

time,

O(n^2)

individual messages, and

O(nD)

travel distance. Implementation of the sorting and navigation use communication messages of fixed size, and are a practical solution for large populations of low-cost robots

arXiv.org e-Print Archive

Crossref