ABSTRACT
INTRODUCTION
Parallel computing is used to solve the large problems in the efficient manner. The scheduling techniques we discuss might be used by an algorithm to optimize the code that comes out of parallelizing algorithms. Thread can be used for task migration dynamically [1] .The algorithm would produce fragments of sequential code, and the optimizer would schedule these specks such that the program runs in the shortest time. Another use of these techniques is in the design of high-performance computing systems. A researcher might want to construct a parallel algorithm that runs in the shortest time possible on some arbitrary computing system which is used to increase the efficiency and decreases the turnaround time. Parallel computing systems are
NOTATIN TABLE:
Total ( 
Priority Assigning and Start Time Computing Phase
Computation of the b-level of DAG is used for the initial scheduling [17] . The following instructions are used to compute the initial scheduling cost of the task graph: In the scheduling process b-level is usually constant until the node has been scheduled. Procedure computes b-level and schedules a list in the descending order. The quantitative behavior of the proposed strategy is depending upon the topology used on the target system. This observation might lead to the conclusion that b-level perform best results for all experiments. Algorithm employ the attribute ALAP (As Late As Possible) start time which measure that how far the node's start time can be delayed without increasing the schedule length. The procedure for computing the ALAP is as follows: 
endfor
According to the priority of the nodes the tasks allocated on the processors in the distributed computing environment. The ALAP time is computing and then constructs a list of tasks in the ascending order of the ALAP time. Ties are broken by considering the ALAP time of the predecessors of the tasks.
PRAM MODEL
It is a robust design paradigm provider. PRAM composed of P processors, each with its own unmodifiable program. A single shared memory composed of a sequence of words, each capable of containing an arbitrary integer [5] . PRAM model is an extension of the familiar RAM model of sequential computation that is used in algorithm analysis. It consists of a read-only input tape and a write-only output tape. Each instruction in the instruction stream is carried out by all processors simultaneously and requires unit time, reckless of the number of processors. Parallel Random Access Machine (pram) model of computation consists of a number of processors operating in lock-step and communicating by reading and writing locations in a shared memory in efficient and systematic manner [13] .In its model each processor has a flag that controls whether it is active in the execution of an instruction or not. Inactive processors do not participate in the execution of instructions. The processor id can be used to distinguish processor behavior while executing the common program. The operation of a synchronous PRAM can result in simultaneous access by multiple processors to the same location in shared memory. The highest processing power of this model can be used by using Concurrent Read Concurrent Write (CRCW) operation. It's a baseline model of concurrency and explicit model which specify operations at each step [11] . It allows both concurrent reads and concurrent writes to shared memory locations. Many algorithms for other models (such as the network model) can be derived directly from PRAM algorithms [12] .
Classification of the PRAM model:
1. In the Common CRCW PRAM, all the processors must write the same value. 2. In the Arbitrary CRCW PRAM, one of the processors arbitrarily succeeds in writing.
3. In the Priority CRCW PRAM, processors have priorities associated with them and the highest priority processor succeeds in writing.
PROPOSED MODEL FOR TASK PARTITIONING IN DISTRIBUTED ENVIRONMENT SCHEDULING:
Task partitioning strategy in parallel computing system is the key factor to decide the efficiency, speedup of the parallel computing systems. The process is partitioned into the subtasks where the size of the task is determined by the run time performance of the each server [9] . In this way assign no. of tasks will be proportional to the performance of the server participate the distributed computing system. The inter process communication cost amongst the task is very important factor which is used to improve the performance of the system [6] . The scheduler schedules the tasks and analyzes the performance of the system. The inter processes communication cost estimation criteria in the proposed model is the key factor for the enhancement of the speed up and turnaround time [8] . The C.P.(Call Procedure) is used to dispatching the task according to the capability of the machines. In this model server machine is assume to make up of n heterogeneous processing elements using the cluster. In the designing of the parallel algorithm, the main goal is to achieve a much as parallelism as possible. Partitioning is the process of dividing the computation and the data into different computational parts.
Nowadays, most research on the integrated circuit or logic optimization are based on single PC, so this paper will add C.P.(Call Procedure) in optimization to improve the speed of logic optimization.
This model splits both computation and data into small tasks [14] . The following basic requirement of partitioning is satisfied by the proposed model:
 There are at least one order of magnitude more primitive tasks than processors upon the target machine to avoid later design options may be too constraints.  Redundant data structure storage and redundant computations are minimized which cause to achieve large scalability for high performance computations.  Primitive partition able tasks are roughly of the same size to maintain the balance work among the processors.  Number of tasks is increasing function of the problem size which avoid the constraints that its impossible to se more processors to solve large problem instances.
The model comprises the existence of an I/O element associated with each processor in the system. The processing time may be executed with help of the Gantt Chart. The connectivity of the processing element can be represented using an undirected graph called the scheduler machine graph [7] . The C.P.( Call Procedure) are used to assign the task dynamically. Task can be assign to a processing element for execution while this processing element is communicating with another processing element. Program completion cost can be computed as:
Total Cost=communication cost +execution cost
Where:
Execution cost=Schedule length Communication cost=the number of node pairs (w,μ) such that (w, μ)∈A and proc(w)=proc(μ).
Algorithm used for the proposed model: An optimal algorithm for scheduling interval ordered tasks on m processor. A task graph G=(V,A) and m processors, the algorithm generates a schedule f that maps each task v∈V, to a processor Pv and a starting time tv. The communication time between the processor Pi and Pj may be defined as: 
Proposed Algorithm for Inter-Process Communication Amongst the Tasks:
In this algorithm the task graph generated and the edge cut gain parameter is considered to calculate the communication cost amongst the tasks [9] .
= €.gainedgecut+(1-€) gainedgecut=edgecutfactor/oldedgecut edgecutfactor= oldedgecut-new_edgecut
Where € is used to set the percentage of gains from edge-cut and workload balance to the total gain.
The bigger €, the higher percentage of edge-cut gain contribute to the total gain of the communication cost. Swapping of the task by the task schedule on processor node n i at 1 with the task schedule on n j at time 2 .When the swapping of the task amongst the different processor then
Fact(2) : total comm(i,j, ) where 1≤,i,j≤P
The effect of the above operation is to swap all the task schedule on node (n i ) at time 1 with the task schedule n node n j at time 2 ∀ 1, 2 ≥ .
Fact(3) :
The following operation is equivalent to the more than one swap operations: In the speedup is the ratio of the serial execution of the program to the parallel execution. In our experiment the result s estimated upon the heterogeneous around 30 processors successively. When the number of the nodes are increased then the speedup is increased up to a improvement level after this level speedup factor is not increased even if the number of processors increase. Serial fraction (e) is useful for the computation of the parallel overhead generated by the execution of the algorithm over the distributed computing environment. Where ( ) is the speedup factor and P are the number of processors using for the parallel execution in distributed heterogeneous environment.
CONCLUSION AND FUTURE WORK:
In this paper, we proposed a new model for estimating the cost of communication amongst the various nodes at the time of the execution. The Improvement ratio of the iterations is also discussed in the paper. Our contribution gives cut edge inter-process communication factor which is highly important factor to assign the task to the heterogeneous systems according to the processing capabilities of the processors on the network. The model can also adapt the changing hardware constraints. The researchers can improve the gain percentage for the inter process communication. 
REFERENCES:

