55 research outputs found

    Efficient processor management strategies for multicomputer systems

    Get PDF
    Multicomputers are cost-effective alternatives to the conventional supercomputers. Contemporary processor management schemes tend to underutilize the processors and leave many of the processors in the system idle while jobs are waiting for execution;Instead of designing faster processors or interconnection networks, a substantial performance improvement can be obtained by implementing better processor management strategies. This dissertation studies the performance issues related to the processor management schemes and proposes several ways to enhance the multicomputer systems by means of processor management. The proposed schemes incorporate the concepts of size-reduction, non-contiguous allocation, as well as job migration. Job scheduling using a bypass-queue is also studied. All the proposed schemes are proven effective in improving the system performance via extensive simulations. Each proposed scheme has different implementation cost and constraints. In order to take advantage of these schemes, judicious selection of system parameters is important and is discussed

    An efficient processor allocation strategy that maintains a high degree of contiguity among processors in 2D mesh connected multicomputers

    Get PDF
    Two strategies are used for the allocation of jobs to processors connected by mesh topologies: contiguous allocation and non-contiguous allocation. In non-contiguous allocation, a job request can be split into smaller parts that are allocated to non-adjacent free sub-meshes rather than always waiting until a single sub-mesh of the requested size and shape is available. Lifting the contiguity condition is expected to reduce processor fragmentation and increase system utilization. However, the distances traversed by messages can be long, and as a result the communication overhead, especially contention, is increased. The extra communication overhead depends on how the allocation request is partitioned and assigned to free sub-meshes. This paper presents a new Non-contiguous allocation algorithm, referred to as Greedy-Available-Busy-List (GABL for short), which can decrease the communication overhead among processors allocated to a given job. The simulation results show that the new strategy can reduce the communication overhead and substantially improve performance in terms of parameters such as job turnaround time and system utilization. Moreover, the results reveal that the Shortest-Service-Demand-First (SSD) scheduling strategy is much better than the First-Come-First-Served (FCFS) scheduling strategy

    Processor allocator for chip multiprocessors

    Full text link
    Chip MultiProcessor (CMP) architectures consisting of many cores connected through Network-on-Chip (NoC) are becoming main computing platforms for research and computer centers, and in the future for commercial solutions. In order to effectively use CMPs, operating system is an important factor and it should support a multiuser environment in which many parallel jobs are executed simultaneously. It is done by the processor management system of the operating system, which consists of two components: Job Scheduler (JS) and Processor Allocator (PA). The JS is responsible for job scheduling that deals with selection of the next job to be executed, while the task of the PA is processor allocation that selects a set of processors for the job selected by the JS. In this thesis, the PA architecture for the NoC-based CMP is explored. The idea of the PA hardware implementation and its integration on one die together with processing elements of CMP is presented. Such an approach requires the PA to be fast as well as area and energy efficient, because it is only a small component of the CMP. The architecture of hardware version of a PA is presented. The main factor of the structure is a type of processor allocation algorithm, employed inside. Thus, all important allocation techniques are intensively investigated and new schemes are proposed. All of them are compared using experimentation system. The PA driven by the described allocation techniques is synthesized on FPGA and crucial energy and area consumption together with performance parameters are extracted. The proposed CMP uses NoC as interconnection architecture. Therefore, all main NoC structures are studied and tested. Most important parameters such as topology, flow control and routing algorithms are presented and discussed. For the proposed NoC structures, an energy model is proposed and described. Finally, the synthesized PAs and NoCs are evaluated in a simulation system, where NoC-based CMP is created. The experimental environment took into consideration energy and traffic balance characteristics. As a result, the most efficient PA and NoC for CMP are presented

    Methods for Precise Submesh Allocation

    Get PDF

    Non-contiguous processor allocation strategy for 2D mesh connected multicomputers based on sub-meshes available for allocation

    Get PDF
    Contiguous allocation of parallel jobs usually suffers from the degrading effects of fragmentation as it requires that the allocated processors be contiguous and has the same topology as the network topology connecting these processors. In non-contiguous allocation, a job can execute on multiple disjoint smaller sub-meshes rather than always waiting until a single sub-mesh of the requested size is available. Lifting the contiguity condition in non-contiguous allocation is expected to reduce processor fragmentation and increase processor utilization. However, the communication overhead is increased because the distances traversed by messages can be longer. The extra communication overhead depends on how the allocation request is partitioned and allocated to free sub-meshes. In this paper, a new non-contiguous processor allocation strategy, referred to as Greedy-Available-Busy-List, is suggested for the 2D mesh network, and is compared using simulation against the well-known non-contiguous and contiguous allocation strategies. To show the performance improved by proposed strategy, we conducted simulation runs under the assumption of wormhole routing and all-to-all communication pattern. The results show that the proposed strategy can reduce the communication overhead and improve performance substantially in terms of turnaround times of jobs and finish times

    Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems

    Get PDF
    Due to its topological generality and flexibility, the k-ary n-cube architecture has been actively researched for various applications. However, the processor allocation problem has not been adequately addressed for the k-ary n-cube architecture, even though it has been studied extensively for hypercubes and meshes. The earlier k-ary n-cube allocation schemes based on conventional slice partitioning suffer from internal fragmentation of processors. In contrast, algorithms based on job-based partitioning alleviate the fragmentation problem but require higher time complexity. This paper proposes a new allocation scheme based on isomorphic partitioning, where the processor space is partitioned into higher dimensional isomorphic subcubes. The proposed scheme minimizes the fragmentation problem and is general in the sense that any size request can be supported and the host architecture need not be isomorphic. Extensive simulation study reveals that the proposed scheme significantly outperforms earlier schemes in terms of mean response time for practical size k-ary and n-cube architectures. The simulation results also show that reduction of external fragmentation is more substantial than internal fragmentation with the proposed scheme

    Isomorphic Strategy for Processor Allocation in k-Ary n-Cube Systems

    Get PDF
    Due to its topological generality and flexibility, the k-ary n-cube architecture has been actively researched for various applications. However, the processor allocation problem has not been adequately addressed for the k-ary n-cube architecture, even though it has been studied extensively for hypercubes and meshes. The earlier k-ary n-cube allocation schemes based on conventional slice partitioning suffer from internal fragmentation of processors. In contrast, algorithms based on job-based partitioning alleviate the fragmentation problem but require higher time complexity. This paper proposes a new allocation scheme based on isomorphic partitioning, where the processor space is partitioned into higher dimensional isomorphic subcubes. The proposed scheme minimizes the fragmentation problem and is general in the sense that any size request can be supported and the host architecture need not be isomorphic. Extensive simulation study reveals that the proposed scheme significantly outperforms earlier schemes in terms of mean response time for practical size k-ary and n-cube architectures. The simulation results also show that reduction of external fragmentation is more substantial than internal fragmentation with the proposed scheme

    Hardware Implementation Of Processor Allocator For Mesh Connected Chip Multiprocessors

    Full text link
    The advancements in the semiconductor process technology and the current demand for highly parallel computing has led to the advent of Chip Multiprocessors (CMPs). CMP is the integration of two or more independent processor cores, which can read and execute program instructions, on to a single integrated circuit die. CMPs are the main computing platforms for research and development in parallel and high performance computing environments. They offer minimum inter-core communication latencies as the processor cores are present on a single chip. The Operating System (OS) plays a key role in using a CMP effectively. The OS should support a multi-user environment in which the jobs are executed in parallel on different cores. This is handled by the processor management system of the OS. The Processor Management System consists of Job Scheduler (JS) and Processor Allocator (PA). The JS aligns the jobs in a queue in an order which is determined by the scheduling policy employed and thus specifying the job that is to be executed next. The PA deals with the selection of appropriate set of processors to execute the job scheduled by the job scheduler. Efficient design of a PA is crucial if one is to harness the full computational power of a CMP in large parallel computing systems. This thesis deals with the processor allocation part of the processor management system. The motive of this thesis is the hardware implementation of a PA for a mesh-connected CMP. The PA is implemented and a synthesis report is presented which shows the amount of logic utilized. Many contiguous and non-contiguous allocation strategies have been proposed for mesh networks in the recent years. The Improvised First Fit algorithm is used to select the appropriate set of processors for executing an incoming job in this hardware implementation. This algorithm is a contiguous allocation algorithm and has complete sub-mesh recognition ability and uses a bit-map approach. The JS is assumed to be employing a First Come First Serve (FCFS) policy to schedule the jobs. This thesis also acts as the basis for the hardware implementation of PA that uses other allocation algorithms in different topologies

    On the Potential of NoC Virtualization for Multicore Chips

    Full text link
    corecore