13 research outputs found

    Non-contiguous processor allocation strategy for 2D mesh connected multicomputers based on sub-meshes available for allocation

    Get PDF
    Contiguous allocation of parallel jobs usually suffers from the degrading effects of fragmentation as it requires that the allocated processors be contiguous and has the same topology as the network topology connecting these processors. In non-contiguous allocation, a job can execute on multiple disjoint smaller sub-meshes rather than always waiting until a single sub-mesh of the requested size is available. Lifting the contiguity condition in non-contiguous allocation is expected to reduce processor fragmentation and increase processor utilization. However, the communication overhead is increased because the distances traversed by messages can be longer. The extra communication overhead depends on how the allocation request is partitioned and allocated to free sub-meshes. In this paper, a new non-contiguous processor allocation strategy, referred to as Greedy-Available-Busy-List, is suggested for the 2D mesh network, and is compared using simulation against the well-known non-contiguous and contiguous allocation strategies. To show the performance improved by proposed strategy, we conducted simulation runs under the assumption of wormhole routing and all-to-all communication pattern. The results show that the proposed strategy can reduce the communication overhead and improve performance substantially in terms of turnaround times of jobs and finish times

    Methods for Precise Submesh Allocation

    Get PDF

    An efficient processor allocation strategy that maintains a high degree of contiguity among processors in 2D mesh connected multicomputers

    Get PDF
    Two strategies are used for the allocation of jobs to processors connected by mesh topologies: contiguous allocation and non-contiguous allocation. In non-contiguous allocation, a job request can be split into smaller parts that are allocated to non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. This paper presents a new Non-contiguous allocation algorithm, referred to as Greedy-Available-Busy-List (GABL for short), which can decrease the communication overhead among processors allocated to a given job. The simulation results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of parameters such as job turnaround time and system utilization. Moreover, the results reveal that the Shortest-Service-Demand-First (SSD) scheduling strategy is much better than the First-Come-First-Served (FCFS) scheduling strategy

    Efficient processor management strategies for multicomputer systems

    Get PDF
    Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed

    Redundancy management for efficient fault recovery in NASA's distributed computing system

    Get PDF
    The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance

    GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

    Full text link
    We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

    Efficient processor allocation strategies for mesh-connected multicomputers

    Get PDF
    Abstract Efficient processor allocation and job scheduling algorithms are critical if the full computational power of large-scale multicomputers is to be harnessed effectively. Processor allocation is responsible for selecting the set of processors on which parallel jobs are executed, whereas job scheduling is responsible for determining the order in which the jobs are executed. Many processor allocation strategies have been devised for mesh-connected multicomputers and these can be divided into two main categories: contiguous and non-contiguous. In contiguous allocation, jobs are allocated distinct contiguous processor sub-meshes for the duration of their execution. Such a strategy could lead to high processor fragmentation which degrades system performance in terms of, for example, the turnaround time and system utilisation. In non-contiguous allocation, a job can execute on multiple disjoint smaller sub-meshes rather than waiting until a single sub-mesh of the requested size and shape is available. Although non-contiguous allocation increases message contention inside the network, lifting the contiguity condition can reduce processor fragmentation and increase system utilisation. Processor fragmentation can be of two types: internal and external. The former occurs when more processors are allocated to a job than it requires while the latter occurs when there are free processors enough in number to satisfy another job request, but they are not allocated to it because they are not contiguous. A lot of efforts have been devoted to reducing fragmentation, and a number of contiguous allocation strategies have been devised to recognize complete sub-meshes during allocation. Most of these strategies have been suggested for 2D mesh-connected multicomputers. However, although the 3D mesh has been the underlying network topology for a number of important multicomputers, there has been relatively little activity with regard to designing similar strategies for such a network. The very few contiguous allocation strategies suggested for the 3D mesh achieve complete sub-mesh recognition ability only at the expense of a high allocation overhead (i.e., allocation and de-allocation time). Furthermore, the allocation overhead in the existing contiguous strategies often grows with system size. The main challenge is therefore to devise an efficient contiguous allocation strategy that can exhibit good performance (e.g., a low job turnaround time and high system utilisation) with a low allocation overhead. The first part of the research presents a new contiguous allocation strategy, referred to as Turning Busy List (TBL), for 3D mesh-connected multicomputers. The TBL strategy considers only those available free sub-meshes which border from the left of those already allocated sub-meshes or which have their left boundaries aligned with that of the whole mesh network. Moreover TBL uses an efficient scheme to facilitate the detection of such available sub-meshes while maintaining a low allocation overhead. This is achieved through maintaining a list of allocated sub-meshes in order to efficiently determine the processors that can form an allocation sub-mesh for a new allocation request. The new strategy is able to identify a free sub-mesh of the requested size as long as it exists in the mesh. Results from extensive simulations under various operating loads reveal that TBL manages to deliver competitive performance (i.e., low turnaround times and high system utilisation) with a much lower allocation overhead compared to other well-known existing strategies. Most existing non-contiguous allocation strategies that have been suggested for the mesh suffer from several problems that include internal fragmentation, external fragmentation, and message contention inside the network. Furthermore, the allocation of processors to job requests is not based on free contiguous sub-meshes in these existing strategies. The second part of this research proposes a new non-contiguous allocation strategy, referred to as Greedy Available Busy List (GABL) strategy that eliminates both internal and external fragmentation and alleviates the contention in the network. GABL combines the desirable features of both contiguous and non-contiguous allocation strategies as it adopts the contiguous allocation used in our TBL strategy. Moreover, GABL is flexible enough in that it could be applied to either the 2D or 3D mesh. However, for the sake of the present study, the new non-contiguous allocation strategy is discussed for the 2D mesh and compares its performance against that of well-known non-contiguous allocation strategies suggested for this network. One of the desirable features of GABL is that it can maintain a high degree of contiguity between processors compared to the previous allocation strategies. This, in turn, decreases the number of sub-meshes allocated to a job, and thus decreases message distances, resulting in a low inter-processor communication overhead. The performance analysis here indicates that the new proposed strategy has lower turnaround time than the previous non-contiguous allocation strategies for most considered cases. Moreover, in the presence of high message contention due to heavy network traffic, GABL exhibits superior performance in terms of the turnaround time over the previous contiguous and non-contiguous allocation strategies. Furthermore, GABL exhibits a high system utilisation as it manages to eliminate both internal and external fragmentation. The performance of many allocation strategies including the ones suggested above, has been evaluated under the assumption that job execution times follow an exponential distribution. However, many measurement studies have convincingly demonstrated that the execution times of certain computational applications are best characterized by heavy-tailed job execution times; that is, many jobs have short execution times and comparatively few have very long execution times. Motivated by this observation, the final part of this thesis reviews the performance of several contiguous allocation strategies, including TBL, in the context of heavy-tailed distributions. This research is the first to analyze the performance impact of heavy-tailed job execution times on the allocation strategies suggested for mesh-connected multicomputers. The results show that the performance of the contiguous allocation strategies degrades sharply when the distribution of job execution times is heavy-tailed. Further, adopting an appropriate scheduling strategy, such as Shortest-Service-Demand (SSD) as opposed to First-Come-First-Served (FCFS), can significantly reduce the detrimental effects of heavy-tailed distributions. Finally, while the new contiguous allocation strategy (TBL) is as good as the best competitor of the previous contiguous allocation strategies in terms of job turnaround time and system utilisation, it is substantially more efficient in terms of allocation overhead

    COMPUTER NETWORK ANALYSIS AND OPTIMISATION

    Get PDF
    This thesis presents a study and analysis of the major influences on network cost and their related performance. New methods have been devised to find solutions to network optimisation problems particular to the AT&T ISTEL networks in Europe and these are presented together with examples of their successful commercial application. Network performance is seen by the user in terms of network availability and traffic delay times. The network performance is influenced by many parameters, the dominating influences typically being the number of users accessing the network, the type of traffic demands they place upon it and the particular network configuration itself. The number of possible network configurations available to a network designer is vast if the full range of currently available equipment is taken into account. The aim of this research has been to assist in the selection of most suitable network designs for optimum performance and cost. This thesis looks at the current differing network technologies, their performance characteristics and the issues pertinent to any network design and optimisation procedures. A distinction is made between the network equipment providing user 'access' and that which constitutes the cross country, or *core\ data transport medium. This partitioning of the problem is exploited with the analysis concentrating on each section separately. The access side of the AT&T ISTEL - UK network is used as a basis for an analysis of the general access network. The aim is to allow network providers to analyse the root cause of excessive delay problems and find where small adjustments to access configurations might lead to real performance improvements from a user point of view. A method is developed to allow statistical estimates of performance and quality of service for typical access network configurations. From this a general method for the optimisation of cost expenditure and performance improvement is proposed. The optimisation of both circuit switched and packet switched computer networks is shown to be difficult and is normally tackled by the use of complex procedures on mainframe computers. The new work carried out in this study takes a fresh look at the basic properties of networks in order to develop a new heuristic method for the design and optimisation of circuit switched core networks on a personal computer platform. A fully functional design system was developed that implements time division multiplexed core network design. The system uses both a new heuristic method for improving the quality of the designs and a new 'speed up' algorithm for reducing times to find feasible routes, thereby dramatically improving overall design times. The completed system has since been used extensively to assist in the design of commercial networks across Europe.AT&T ISTEL Ltd., Redditch, Englan

    Proceedings, MSVSCC 2015

    Get PDF
    The Virginia Modeling, Analysis and Simulation Center (VMASC) of Old Dominion University hosted the 2015 Modeling, Simulation, & Visualization Student capstone Conference on April 16th. The Capstone Conference features students in Modeling and Simulation, undergraduates and graduate degree programs, and fields from many colleges and/or universities. Students present their research to an audience of fellow students, faculty, judges, and other distinguished guests. For the students, these presentations afford them the opportunity to impart their innovative research to members of the M&S community from academic, industry, and government backgrounds. Also participating in the conference are faculty and judges who have volunteered their time to impart direct support to their students’ research, facilitate the various conference tracks, serve as judges for each of the tracks, and provide overall assistance to this conference. 2015 marks the ninth year of the VMASC Capstone Conference for Modeling, Simulation and Visualization. This year our conference attracted a number of fine student written papers and presentations, resulting in a total of 51 research works that were presented. This year’s conference had record attendance thanks to the support from the various different departments at Old Dominion University, other local Universities, and the United States Military Academy, at West Point. We greatly appreciated all of the work and energy that has gone into this year’s conference, it truly was a highly collaborative effort that has resulted in a very successful symposium for the M&S community and all of those involved. Below you will find a brief summary of the best papers and best presentations with some simple statistics of the overall conference contribution. Followed by that is a table of contents that breaks down by conference track category with a copy of each included body of work. Thank you again for your time and your contribution as this conference is designed to continuously evolve and adapt to better suit the authors and M&S supporters. Dr.Yuzhong Shen Graduate Program Director, MSVE Capstone Conference Chair John ShullGraduate Student, MSVE Capstone Conference Student Chai

    Sixth Biennial Report : August 2001 - May 2003

    No full text
    corecore