250 research outputs found

    Load balancing fictions, falsehoods and fallacies

    Get PDF
    Abstract Effective use of a parallel computer requires that a calculation be carefully divided among the processors. This load balancing problem appears in many guises and has been a fervent area of research for the past decade or more. Although great progress has been made, and useful software tools developed, a number of challenges remain. It is the conviction of the author that these challenges will be easier to address if we first come to terms with some significant shortcomings in our current perspectives. This paper tries to identify several areas in which the prevailing point of view is either mistaken or insufEcient. The goal is to motivate new ideas and directions for this important field

    Load Duration and Probability Based Design of Wood Structural Members

    Get PDF
    Methods are presented for calculating limit state probabilities of engineered wood structural members, considering load duration effects due to stochastic dead and snow load. These methods are used to conduct reliability studies of existing wood design criteria. When realistic load processes are considered, it is found that the importance of load duration and gradual damage accumulation has been somewhat overstated. One possible probability-based design method that should be useful in future code development work also is presented

    Exploiting flexibly assignable work to improve load balance

    Get PDF
    In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment of tasks to improve load balance. We first model the problem in terms of network flow and use combinatorial techniques for its solution. Our parametric search algorithms use maximum flow algorithms for probing on a candidate optimal solution value. We describe two algorithms to solve the assignment problem with log W{sub T} and |P| probe calls, where W{sub T} and |P|, respectively, denote the total workload and number of processors. We also define augmenting paths and cuts for this problem, and show that any algorithm based on augmenting paths can be used to find an optimal solution for the task assignment problem. We then consider a continuous version of the problem, and formulate it as a linearly constrained optimization problem, i.e., min ||Ax||{sub {infinity}}, s.t. Bx = d. To avoid solving an intractable {infinity}-norm optimization problem, we show that in this case minimizing the 2-norm is sufficient to minimize the {infinity}-norm, which reduces the problem to the well-studied linearly-constrained least squares problem. The continuous version of the problem has the advantage of being easily amenable to parallelization

    Exploiting flexibly assignable work to improve load balance

    Get PDF

    A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L.

    Get PDF
    Abstract Many emerging large-scale data science applications require searching large graphs distributed across multiple memories and processors. This paper presents a distributed breadth-first search (BFS) scheme that scales for random graphs with up to three billion vertices and 30 billion edges. Scalability was tested on IBM BlueGene/L with 32,768 nodes at the Lawrence Livermore National Laboratory. Scalability was obtained through a series of optimizations, in particular, those that ensure scalable use of memory. We use 2D (edge) partitioning of the graph instead of conventional 1D (vertex) partitioning to reduce communication overhead. For Poisson random graphs, we show that the expected size of the messages is scalable for both 2D and 1D partitionings. Finally, we have developed efficient collective communication functions for the 3D torus architecture of BlueGene/L that also take advantage of the structure in the problem. The performance and characteristics of the algorithm are measured and reported

    Tolerating the Community Detection Resolution Limit with Edge Weighting

    Full text link
    Communities of vertices within a giant network such as the World-Wide Web are likely to be vastly smaller than the network itself. However, Fortunato and Barth\'{e}lemy have proved that modularity maximization algorithms for community detection may fail to resolve communities with fewer than L/2\sqrt{L/2} edges, where LL is the number of edges in the entire network. This resolution limit leads modularity maximization algorithms to have notoriously poor accuracy on many real networks. Fortunato and Barth\'{e}lemy's argument can be extended to networks with weighted edges as well, and we derive this corollary argument. We conclude that weighted modularity algorithms may fail to resolve communities with fewer than Wϵ/2\sqrt{W \epsilon/2} total edge weight, where WW is the total edge weight in the network and ϵ\epsilon is the maximum weight of an inter-community edge. If ϵ\epsilon is small, then small communities can be resolved. Given a weighted or unweighted network, we describe how to derive new edge weights in order to achieve a low ϵ\epsilon, we modify the ``CNM'' community detection algorithm to maximize weighted modularity, and show that the resulting algorithm has greatly improved accuracy. In experiments with an emerging community standard benchmark, we find that our simple CNM variant is competitive with the most accurate community detection methods yet proposed.Comment: revision with 8 pages 3 figures 2 table

    Parallel Shortest Path Algorithms for Solving Large-Scale Instances

    Get PDF
    We present an experimental study of parallel algorithms for solving the single source shortest path problem with non-negative edge weights (NSSP) on large-scale graphs. We implement Meyer and Sander's Δ-stepping algorithm and report performance results on the Cray MTA-2, a multithreaded parallel architecture. The MTA-2 is a high-end shared memory system offering two unique features that aid the efficient implementation of irregular parallel graph algorithms: the ability to exploit fine-grained parallelism, and low-overhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with a competitive sequential algorithm, for low-diameter sparse graphs. For instance, Δ-stepping on a directed scale-free graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA-2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a parallel NSSP problem on realistic graph instances in the order of billions of vertices and edges
    • …
    corecore