Chip multiprocessors are used widely today. The cores in a chip can be homogeneous or heterogeneous. This paper proposes a scheduling algorithm for heterogeneous multiprocessors wotj, multiple functional units of varying speed in each processor. Instructions that can be scheduled in parallel are considered. An optimization function is developed to allocate the processes to the processors that minimize the overall execution time. The proposed model is simulated for a chosen example and verified to give 46% improvement in performance.
INTRODUCTION
Chip multiprocessors can be homogeneous or heterogeneous. In homogeneous CMP, all the processors belong to the same family. They have the same speed. In heterogeneous CMP, processors belong to different families. Their processing speed differs. Each processor in CMP can have multiple functional units for the same operation. The speed of individual functional unit may vary. Any program segment can be visualized to start with a serial part following by <parallel part, serial part> pairs ending with a serial part. Scheduling the parallel part on symmetric (homogeneous) processors can be achieved by adopting any scheduling algorithm. Scheduling the parallel part on asymmetric (heterogeneous) processor is a tougher task as the processing speed differs. It is logical to schedule every type of instruction on a processor which executes in minimum time. However, in reality this may not be feasible based on the instruction mix.
The model in [1] describes scheduling of processes on asymmetric processors with dependencies. This paper considers processes that can be scheduled in parallel without data dependencies on CMP. It is also assumed that each processor in CMP has multiple functional units for each instruction type with varying speed. This paper proposes an optimization function that determines the schedule of instructions in the parallel part of any program segment in asymmetric CMP. The solution to this function gives the schedule. The rest of the paper is as follows. Section 2 gives the motivation, section 3 the proposed model, section 4 simulation, section 5 conclusion, section 6 references.
MOTIVATION
Consider asymmetric CMP system with four processors belonging to f families. Let the parallel part of the program segment contain four load instructions, six store instructions, four multiply instructions, twenty floating point operations.
The following table Table1 gives the execution time in units of time of each type of instruction on the four processors. The following schedule in Table 2 gives optimal scheduling The optimal time is 140 units of time. From Table 2 it can be seen that three floating point instructions are scheduled on processor three, two load instructions are scheduled on processor four etc. The optimal time for the schedule is given as follows. It takes 3*0 time units for Load instruction on processor one, 4*0 time units on processor two, 7*2=14 time units on processor three, 5*2=10 units of time on processor four for Load instruction. For Store instruction the time taken is 12 units of time on processor one and zero units of time on other processors. For multiply instruction, the time taken is 2*6=12 time units on processor one, zero time units on processor two, 1*4=4 time units on processor three, 1*10=10 time units on processor four. For floating point operation, the execution time is 105, 120, 39, 120 units on processors 1, 2, 3, 4 respectively. The total time taken is maximum time on any processor which is 140 time units on processor four.
The same instruction mix if allotted based on the following strategy gives the total execution time to be 260ns. Allot to the processor that has minimum execution time. According to this rule, Load is allotted to processor-1 takes 4*5 = 20 units of time Store is allotted to processor-1 takes 6*2=12 units of time. Multiply is allotted to processor-3 takes 4*4 = 16 units of time. Floating point is allotted to processor-3 takes 20*13 = 260 units of time. Store needs to follow load on same processor.
Table 3
Processor# A performance improvement of 46% is seen using the proposed model. This is the motivation of this paper.
PROPOSED MODEL
Let there be n instruction types numbered from 1, 2... n. Let there be m processors numbered as m P P ,..., The objective function can be written as min(max((max( 111 1, a(1,1) )), max( 121 (1,2,1) , 122 x t(1, 2, 2), 2,a(1,2) )),max( 131 (1,3,1) , 132 x t (1,3,2) , 133 x t(1,3,3),…, 3,a(1,3) 3,a(i,3) )), ……. Solving (4) for integer solutions gives the task allocation. This allocation ensures that the speed of the processors is utilized to the maximum extent.
SIMULATIONS
The proposed model allocates the tasks in the parallel segment of a program. The allocation is done by solving the optimization function given in (4) . The optimization function can be solved using MS-Excel Solver package. The simulation of the proposed model does not require a system simulation as the proposed model is for task allocation. Consider the following system. There are four processors. There are four types of instructions namely load, store, multiply and floating point operation. The number of functional units for these processors for the four types of instructions is given by matrix A. Solving (5) P with a value of 27 cycles. The solution to the problem is hence max (4, 3, 8, 27 ) which is 27cycles.
CONCLUSION
An algorithm to allocate tasks in parallel segment of a program among heterogeneous CMP is proposed in this paper. The algorithm allocates based on the speed of computation on each processor. An optimization function is developed to allocate. The solution to the optimization function gives the allocation.
