1,643 research outputs found
Task allocation in a distributed computing system
A conceptual framework is examined for task allocation in distributed systems. Application and computing system parameters critical to task allocation decision processes are discussed. Task allocation techniques are addressed which focus on achieving a balance in the load distribution among the system's processors. Equalization of computing load among the processing elements is the goal. Examples of system performance are presented for specific applications. Both static and dynamic allocation of tasks are considered and system performance is evaluated using different task allocation methodologies
Processor allocation strategies for modified hypercubes
Parallel processing has been widely accepted to be the future in high speed computing. Among the various parallel architectures proposed/implemented, the hypercube has shown a lot of promise because of its poweful properties, like regular topology, fault tolerance, low diameter, simple routing, and ability to efficiently emulate other architectures. The major drawback of the hypercube network is that it can not be expanded in practice because the number of communication ports for each processor grows as the logarithm of the total number of processors in the system. Therefore, once a hypercube supercomputer of a certain dimensionality has been built, any future expansions can be accomplished only by replacing the VLSI chips. This is an undesirable feature and a lot of work has been under progress to eliminate this stymie, thus providing a platform for easier expansion.
Modified hypercubes (MHs) have been proposed as the building blocks of hypercube-based systems supporting incremental growth techniques without introducing extra resources for individual hypercubes.
However, processor allocation on MHs proves to be a challenge due to a slight deviation in their topology from that of the standard hypercube network. This thesis addresses the issue of processor allocation on MHs and proposes various strategies which are based, partially or entirely, on table look-up approaches. A study of the various task allocation strategies for standard hypercubes is conducted and their suitability for MHs is evaluated. It is shown that the proposed strategies have a perfect subcube recognition ability and a superior performance. Existing processor allocation strategies for pure hypercube networks are demonstrated to be ineffective for MHs, in the light of their inability to recognize all available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results
Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems
Due to its topological generality and flexibility, the k-ary n-cube architecture has been actively researched for various applications. However, the processor allocation problem has not been adequately addressed for the k-ary n-cube architecture, even though it has been studied extensively for hypercubes and meshes. The earlier k-ary n-cube allocation schemes based on conventional slice partitioning suffer from internal fragmentation of processors. In contrast, algorithms based on job-based partitioning alleviate the fragmentation problem but require higher time complexity. This paper proposes a new allocation scheme based on isomorphic partitioning, where the processor space is partitioned into higher dimensional isomorphic subcubes. The proposed scheme minimizes the fragmentation problem and is general in the sense that any size request can be supported and the host architecture need not be isomorphic. Extensive simulation study reveals that the proposed scheme significantly outperforms earlier schemes in terms of mean response time for practical size k-ary and n-cube architectures. The simulation results also show that reduction of external fragmentation is more substantial than internal fragmentation with the proposed scheme
Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems
Due to its topological generality and flexibility, the k-ary n-cube architecture has been actively researched for various applications. However, the processor allocation problem has not been adequately addressed for the k-ary n-cube architecture, even though it has been studied extensively for hypercubes and meshes. The earlier k-ary n-cube allocation schemes based on conventional slice partitioning suffer from internal fragmentation of processors. In contrast, algorithms based on job-based partitioning alleviate the fragmentation problem but require higher time complexity. This paper proposes a new allocation scheme based on isomorphic partitioning, where the processor space is partitioned into higher dimensional isomorphic subcubes. The proposed scheme minimizes the fragmentation problem and is general in the sense that any size request can be supported and the host architecture need not be isomorphic. Extensive simulation study reveals that the proposed scheme significantly outperforms earlier schemes in terms of mean response time for practical size k-ary and n-cube architectures. The simulation results also show that reduction of external fragmentation is more substantial than internal fragmentation with the proposed scheme
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
Optimal processor assignment for pipeline computations
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered
Redundancy management for efficient fault recovery in NASA's distributed computing system
The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
- …