21 research outputs found
Network-Oblivious Algorithms
A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem\u2019s input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality in the latter model implies optimality in the decomposable bulk synchronous parallel model, which is known to effectively describe a wide and significant class of parallel platforms. The proposed framework can be regarded as an attempt to port the notion of obliviousness, well established in the context of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed
Deterministic Simulations of Shared Memory on Bounded Degree Networks
The Parallel Random Access Machine (PRAM) is an abstract parallel machine consisting of a synchronous collection of processors connected to a shared memory of cells. The essential feature of the PRAM is that the processors can access any -tuple of distinct cells in a single machine cycle. While the PRAM is an attractive and widely used framework for the design and analysis of parallel algorithms, it does not reflect the constraints of realistic multiprocessors. This thesis explores the problem of efficient deterministic simulations of PRAM computations on bounded degree networks of processors, a model of parallel machines closer to what can be built in practice. It is shown that an arbitrary step of a PRAM with processors and cells of shared memory can be simulated in (log( log /log log + log log log (log log - log log log )) time in the worst-case on an -node bounded degree network with a particular expander-based structure. This simulation is more efficient than all deterministic simulations previously known both with respect to time and space. In the case where is polylogarithmic in , the worst-case time to simulate a single PRAM step is at most (log log log ) which is within a factor of nOm(m/n))^{3})nOnn(m/n)n(m/n))$ Overall, these results suggest that, in principle at least, it is feasible to provide the abstraction of a shared memory on distributed models of parallel computation with only modest degradation in performance in the worst case
Improved Bounds for the Token Distribution Problem
The problem of packet routing on bounded degree networks is considered. An algorithm is presented that can route packets in (log time on a particular -node expander-based network provided that no more than packets share the same source or destination
�ÓÖÑ�Ò � Ø��×��Ò��Ô�Ò��ÒØÐÝ�Ó�×ÒÓØ��Ò�Ö�ÐÐÝÝ��Ð�ÓÔØ�Ñ�ÐÔ�Ö �ÔÔÖÓÔÖ��Ø�ÒÙÑ��ÖÓ�ÔÓ�ÒØØÓÔÓ�ÒØÑ�××���×�Ò�ÖÓÙØ�Ò� Ø�Ñ��×Ö�ÕÙ�Ö��ØÓÖÓÙØ�Ñ�ÒÑ�××���×Û��Ö����Ñ�×
ÑÙ×Ø����Ð�Ú�Ö��ØÓ�ÒÝ�Ò��Ú��Ù�ÐÒÓ��Ì��ÐÓÛ�Ö�ÓÙÒ� ���ÒÓ��Ì��Ö�Ò�ÓÑ�Þ���Ð�ÓÖ�Ø�Ñ�ØØ��Ò×ÓÔØ�Ñ�ÐÔ�Ö �ÓÖÓÒ�ØÓÑ�ÒÝÖÓÙØ�Ò�Û���Ù×�ÓÒ×Ø�ÒØ×�Þ��Ù«�Ö×�Ø �Ð�ÓÖ�Ø�Ñ× �ÓÖÑ�Ò�Û��Ð�Ø����Ø�ÖÑ�Ò�×Ø � ��ØÓÖÓ�Ç ÐÓ�Ò¡Ï��Ð×Ó��×Ö����ÒÓÔØ�Ñ�Ð��Ø�ÖÑ�Ò �Ð�ÓÖ�Ø�Ñ�××ÐÓÛ�Ö�Ý� �×Ø � Ç �Ð�ÓÖ�Ø�ÑØ��Ø�ÓÛ�Ú�ÖÖ�ÕÙ�Ö�×Ð�Ö���Ù«�Ö×Ó�×�Þ� 1. ÊÓÙØ�Ò�ÔÖ�Ñ�Ø�Ú�×�Ó
Deterministic Simulations of PRAMs on Bounded Degree Networks
The problem of simulating a PRAM with processors and memory size on an -node bounded degree network is considered. A scheme is presented which simulates an arbitrary PRAM step in time in the worst case on an expander-based network. By extending a previously established lower bound, it is shown that the proposed simulation is optimal whenever for some and some
Mesh ∗
We study the complexity of routing a set of messages with multiple destinations (multicast routing) on an n-node square mesh under the store-and-forward model. A standard argument proves that Ω ( √ cn) time is required to route n messages, where each message is generated by a distinct node and at most c messages are to be delivered to any individual node. The obvious approach of simply replicating each message into the appropriate number of unicast (single-destination) messages and routing these independently does not yield an optimal algorithm. We provide both randomized and deterministic algorithms for multicast routing, which use constantsize buffers at each node. The randomized algorithm attains � optimal performance, while the deterministic algorithm is slower by a factor of O log 2 � n. We also describe an optimal deterministic algorithm that, however, requires large buffers of size O (c). 2
Deterministic Branch-And-Bound On Distributed Memory Machines
The branch-and-bound problem involves determining the leaf of minimum cost in a cost-labelled, heap-ordered tree, subject to the constraint that only the root is known initially and that the children of a node are revealed only by visiting their parent. We present the first efficient deterministic algorithm to solve the branch-and-bound problem for a tree T of constant degree on a p-processor distributed-memory Optically Connected Parallel Computer (OCPC). Let c be the cost of the minimum-cost leaf in T , and let n and h be the number of nodes and the height, respectively, of the subtree T ` T of nodes whose cost is at most c . When accounting for both computation and communication costs, our algorithm runs in time O \Gamma n=p + h(maxfp; log n log pg) 2 \Delta for general values of n, and can be made to run in time O \Gamma\Gamma n=p + h log 4 p \Delta log log p \Delta for n polynomial in p. For large ranges of the relevant parameters, our algorithm is provab..
Implementing Shared Memory on Mesh-Connected Computers and on the Fat-Tree
We present deterministic upper and lower bounds on the slowdown required to simulate an (n; m)-PRAM on a variety of networks. The upper bounds are based on a novel scheme that exploits the splitting and combining of messages. This scheme can be implemented on an n-node d-dimensional mesh (for constant d) and on an n-leaf pruned butterfly and attains the smallest worst-case slowdown to date for such interconnections, namely, O \Gamma n 1=d (log(m=n)) 1\Gamma1=d \Delta for the d-dimensional mesh (with constant d) and O( p n log(m=n)) for the pruned butterfly. In fact, the simulation on the pruned butterfly is the first PRAM simulation scheme on an area-universal network. Finally, we prove restricted and unrestricted lower bounds on the slowdown of any deterministic PRAM simulation on an arbitrary network, formulated in terms of the bandwidth properties of the interconnection as expressed by its decomposition tree. 3 List of Symbols Used 1 one l lower-case ell 0 zer..
Deterministic Parallel Backtrack Search
The backtrack search problem involves visiting all the nodes of an arbitrary binary tree given a pointer to its root, subject to the constraint that the children of a node are revealed only after their parent is visited. We present a fast, deterministic backtrack search algorithm for a p-processor COMMON CRCW-PRAM, which visits any n-node tree of height h in time O (n=p + h)(log log log p) 2 . This upper bound compares favourably with a natural n=p + h) lower bound for this problem. Our approach embodies novel, ecient techniques for dynamically assigning tree-nodes to processors to ensure that the work is shared equitably among them. Key words: Backtrack search. Load balancing. PRAM model. Parallel algorithms. 1 Introduction Several algorithmic techniques, such as those employed for solving many optimization problems, are based on the systematic exploration of a tree, whose internal nodes correspond to partial solutions (growing progressively more re- ned with increasing depth)..