718 research outputs found
The effect of real workloads and stochastic workloads on the performance of allocation and scheduling algorithms in 2D mesh multicomputers
The performance of the existing non-contiguous processor allocation strategies has been traditionally carried out by means of simulation based on a stochastic workload model to generate a stream of incoming jobs. To validate the performance of the existing algorithms, there has been a need to evaluate the algorithms' performance based on a real workload trace. In this paper, we evaluate the performance of several well-known processor allocation and job scheduling strategies based on a real workload trace and compare the results against those obtained from using a stochastic workload. Our results reveal that the conclusions reached on the relative performance merits of the allocation strategies when a real workload trace is used are in general compatible with those obtained when a stochastic workload is used
An efficient processor allocation strategy that maintains a high degree of contiguity among processors in 2D mesh connected multicomputers
Two strategies are used for the allocation of jobs to processors connected by mesh topologies: contiguous allocation and non-contiguous allocation. In non-contiguous allocation, a job request can be split into smaller parts that are allocated to non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. This paper presents a new Non-contiguous allocation algorithm, referred to as Greedy-Available-Busy-List (GABL for short), which can decrease the communication overhead among processors allocated to a given job. The simulation results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of parameters such as job turnaround time and system utilization. Moreover, the results reveal that the Shortest-Service-Demand-First (SSD) scheduling strategy is much better than the First-Come-First-Served (FCFS) scheduling strategy
Arithmetic on a Distributed-Memory Quantum Multicomputer
We evaluate the performance of quantum arithmetic algorithms run on a
distributed quantum computer (a quantum multicomputer). We vary the node
capacity and I/O capabilities, and the network topology. The tradeoff of
choosing between gates executed remotely, through ``teleported gates'' on
entangled pairs of qubits (telegate), versus exchanging the relevant qubits via
quantum teleportation, then executing the algorithm using local gates
(teledata), is examined. We show that the teledata approach performs better,
and that carry-ripple adders perform well when the teleportation block is
decomposed so that the key quantum operations can be parallelized. A node size
of only a few logical qubits performs adequately provided that the nodes have
two transceiver qubits. A linear network topology performs acceptably for a
broad range of system sizes and performance parameters. We therefore recommend
pursuing small, high-I/O bandwidth nodes and a simple network. Such a machine
will run Shor's algorithm for factoring large numbers efficiently.Comment: 24 pages, 10 figures, ACM transactions format. Extended version of
Int. Symp. on Comp. Architecture (ISCA) paper; v2, correct one circuit error,
numerous small changes for clarity, add reference
Computer architecture evaluation for structural dynamics computations: Project summary
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies
Advanced software techniques for space shuttle data management systems Final report
Airborne/spaceborn computer design and techniques for space shuttle data management system
ATAMM enhancement and multiprocessing performance evaluation
The algorithm to architecture mapping model (ATAAM) is a Petri net based model which provides a strategy for periodic execution of a class of real-time algorithms on multicomputer dataflow architecture. The execution of large-grained, decision-free algorithms on homogeneous processing elements is studied. The ATAAM provides an analytical basis for calculating performance bounds on throughput characteristics. Extension of the ATAMM as a strategy for cyclo-static scheduling provides for a truly distributed ATAMM multicomputer operating system. An ATAAM testbed consisting of a centralized graph manager and three processors is described using embedded firmware on 68HC11 microcontrollers
Efficient processor management strategies for multicomputer systems
Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed
Parallel discrete event simulation: A shared memory approach
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models
A performance analysis method for distributed real-time robotic systems: A case study of remote teleoperation
Robot coordination and control systems for remote teleoperation applications are by necessity implemented on distributed computers. Modeling and performance analysis of these distributed robotic systems is difficult, but important for economic system design. Performance analysis methods originally developed for conventional distributed computer systems are often unsatisfactory for evaluating real-time systems. The paper introduces a formal model of distributed robotic control systems; and a performance analysis method, based on scheduling theory, which can handle concurrent hard-real-time response specifications. Use of the method is illustrated by a case of remote teleoperation which assesses the effect of communication delays and the allocation of robot control functions on control system hardware requirements
- âŠ