Search CORE

250 research outputs found

Load balancing fictions, falsehoods and fallacies

Author: Bruce Hendrickson
Publication venue
Publication date: 01/01/2000
Field of study

Abstract Effective use of a parallel computer requires that a calculation be carefully divided among the processors. This load balancing problem appears in many guises and has been a fervent area of research for the past decade or more. Although great progress has been made, and useful software tools developed, a number of challenges remain. It is the conviction of the author that these challenges will be easier to address if we first come to terms with some significant shortcomings in our current perspectives. This paper tries to identify several areas in which the prevailing point of view is either mistaken or insufEcient. The goal is to motivate new ideas and directions for this important field

CiteSeerX

UNT Digital Library

Load Duration and Probability Based Design of Wood Structural Members

Author: Ellingwood Bruce R.
Hendrickson Erik M.
Murphy Joseph F.
Publication venue: Wood and Fiber Science
Publication date: 22/06/2007
Field of study

Methods are presented for calculating limit state probabilities of engineered wood structural members, considering load duration effects due to stochastic dead and snow load. These methods are used to conduct reliability studies of existing wood design criteria. When realistic load processes are considered, it is found that the importance of load duration and gradual damage accumulation has been somewhat overstated. One possible probability-based design method that should be useful in future code development work also is presented

Wood and Fiber Science (E-Journal)

Exploiting flexibly assignable work to improve load balance

Author: Hendrickson Bruce
Pinar Ali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2002
Field of study

In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment of tasks to improve load balance. We first model the problem in terms of network flow and use combinatorial techniques for its solution. Our parametric search algorithms use maximum flow algorithms for probing on a candidate optimal solution value. We describe two algorithms to solve the assignment problem with log W{sub T} and |P| probe calls, where W{sub T} and |P|, respectively, denote the total workload and number of processors. We also define augmenting paths and cuts for this problem, and show that any algorithm based on augmenting paths can be used to find an optimal solution for the task assignment problem. We then consider a continuous version of the problem, and formulate it as a linearly constrained optimization problem, i.e., min ||Ax||{sub {infinity}}, s.t. Bx = d. To avoid solving an intractable {infinity}-norm optimization problem, we show that in this case minimizing the 2-norm is sufficient to minimize the {infinity}-norm, which reduces the problem to the well-studied linearly-constrained least squares problem. The continuous version of the problem has the advantage of being easily amenable to parallelization

Crossref

eScholarship - University of California

UNT Digital Library

Exploiting flexibly assignable work to improve load balance

Author: Ali Pinar
Bruce Hendrickson
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Scheduling and load balancing

Author: Geib Jean-Marc
Hendrickson Bruce
Manneback Pierre
Roman Jean
Publication venue
Publication date: 01/01/1999
Field of study

ORBi UMONS

A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L.

Author: Andy Yoo
Andy Yoo
Bruce Hendrickson
Bruce Hendrickson
Edmond Chow
Edmond Chow
Keith Henderson
Keith Henderson
Umit Catalyurek
William Mclendon
William Mclendon
§ † Lawrence
¶ D E Shaw
Publication venue
Publication date: 01/01/2005
Field of study

Abstract Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth-first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported

CiteSeerX

Tolerating the Community Detection Resolution Limit with Edge Weighting

Author: Bruce Hendrickson
Cynthia A. Phillips
J. D. Gibbons
J. Han
J. Leskovec
J. Ruan
Jonathan W. Berry
K. J. Lang
R. Dunbar
Randall A. LaViolette
Publication venue: 'American Physical Society (APS)'
Publication date: 07/10/2009
Field of study

Communities of vertices within a giant network such as the World-Wide Web are likely to be vastly smaller than the network itself. However, Fortunato and Barth\'{e}lemy have proved that modularity maximization algorithms for community detection may fail to resolve communities with fewer than

\sqrt{L/2}

edges, where

L

is the number of edges in the entire network. This resolution limit leads modularity maximization algorithms to have notoriously poor accuracy on many real networks. Fortunato and Barth\'{e}lemy's argument can be extended to networks with weighted edges as well, and we derive this corollary argument. We conclude that weighted modularity algorithms may fail to resolve communities with fewer than

\sqrt{W \epsilon/2}

total edge weight, where

W

is the total edge weight in the network and

\epsilon

is the maximum weight of an inter-community edge. If

\epsilon

is small, then small communities can be resolved. Given a weighted or unweighted network, we describe how to derive new edge weights in order to achieve a low

\epsilon

, we modify the ``CNM'' community detection algorithm to maximize weighted modularity, and show that the resulting algorithm has greatly improved accuracy. In experiments with an emerging community standard benchmark, we find that our simple CNM variant is competitive with the most accurate community detection methods yet proposed.Comment: revision with 8 pages 3 figures 2 table

arXiv.org e-Print Archive

Crossref

Parallel Shortest Path Algorithms for Solving Large-Scale Instances

Author: Bruce A. Hendrickson
David A. Bader
Jonathan W. Berry
Joseph R. Crobak
Kamesh Madduri
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2006
Field of study

We present an experimental study of parallel algorithms for solving the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs. We implement Meyer and Sander's Δ-stepping algorithm and report performance results on the Cray MTA-2, a multithreaded parallel architecture. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient implementation of irregular parallel graph algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with a competitive sequential algorithm, for low-diameter sparse graphs. For instance, Δ-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a parallel NSSP problem on realistic graph instances in the order of billions of vertices and edges

Scholarly Materials And Research @ Georgia Tech

CiteSeerX