Introduction
Including multiple cores on a single chip has become the dominant mechanism for scaling processor performance. Exponential growth in the number of cores on a single processor is expected to lead in a short time to mainstream computers with hundreds of cores. Scalable implementations of parallel algorithms will be necessary in order to achieve improved singleapplication performance on such processors. In addition, memory access will continue to be an important limiting factor on achieving performance, and heterogeneous systems may make use of cores with varying capabilities and performance characteristics. An appropriate programming model can address scalability and can expose data locality while making it possible to migrate application code between processors with different parallel architectures and variable numbers and kinds of cores. These processors have a large number of cores and are available to consumers today, but the scalable programming models developed for them are also applicable to current and future multi core CPUs. A multi-core processor (or chip-level multiprocessor, CMP) combines two or more independent cores (normally a CPU) into a single package composed of a single integrated circuit (IC), called a die, or more dies packaged together. A dual-core processor contains two cores, and a quad-core processor contains four cores. A multi-core microprocessor implements multiprocessing in a single physical package. A processor with all cores on a single die is called a monolithic processor. Cores in a multi core device may share a single coherent cache at the highest on-device cache level (e.g. L2 for the Intel Core 2) or may have separate caches (e.g. current AMD dual-core processors).
The processors also share the same interconnect to the rest of the system. Each "core" independently implements optimizations such as superscalar execution, pipelining, and multithreading. A system with n cores is effective when it is presented with n or more threads concurrently. The most commercially significant (or at least the most 'obvious') multicore processors are those used in personal computers (primarily from Intel and AMD) and game consoles (e.g., the eight-core Cell processor in the PS3 and the three-core Xenon processor in the Xbox 360). In this context, "multi" typically means a relatively small number of cores. However, the technology is widely used in other technology areas, especially those of embedded processors, such as network processors and digital signal processors, and in GPUs. The amount of performance gained by the use of a multi core processor depends on the problem being solved and the algorithms used, as well as their implementation in software (Amdahl's law). For so-called "embarrassingly parallel" problems, a dual-core processor with two cores at 2GHz may perform very nearly as quickly as a single core of 4GHz. [1] Other problems, though, may not yield so much speedup. This all assumes, however, that the software has been designed to take advantage of available parallelism. If it hasn't, there will not be any speedup at all. However, the processor will multitask better since it can run two programs at once, one on each core.
Scheduling 2.1 Scheduling Algorithms for different topologies:-
Current multi-core architectures take a variety of approaches for memory handling and interconnection between cores. Here currently not considering memory at all, but assume the need for data forwarding from one processing node the next. This is an approach that is not typically used in the current Intel architecture, but rather typical for specialized media processors, and also applicable for NUMA architectures. Interconnects come as switch, bus, or point-topoint approaches. We model all of them as switches and point to-point links at this time. Furthermore, the architectures that we use in the simulations are inspired by the topologies of the Intel Core 2, AMD's Opteron, the STI Cell BE and nVIDIA's GPUs). It intend to examine the effects of topologies and not real-world processor capabilities, we have provided all models with the same total processing capacity. The processing capacity is however, distributed over different numbers of processing units (heterogeneous in case of the Cell-inspired topology). Similarly, the bandwidth of interconnects is modelled in abstract performance numbers according to their role in the multi-core architecture that inspired each topology. Developing high-level scheduling algorithms for arbitrary inter-dependent workloads is a future goal, along with real world implementations. A one-for-all algorithm appears to be out of the question, and there is no reason why a real world implementation should be able to adapt dynamically to wide ranges of different hardware. In any case, here, it is use strategies that are inspired by packing strategies for passive operating system resources, but extended with functionality to check link availability. The selected results use the following strategies: 2.1.1 First Fit (FF): Assign every processing unit an index. For every media operation that requires a certain amount of processing capacity, start searching at the processing unit with index 0 for available processing capacity. If the unit with index 0 does not have sufficient resources, use a breadth-first search (BFS) approach through the topological neighbours of the unit, until a unit with sufficient capacity is found, or until the scheduling fails.
Next Fit (NF):
Similar to FF, but instead of starting at index 0 (first processing unit) for every new job, NF starts its search (in BFS order) from the node where the previous media operation was successfully scheduled. If subsequent media operations are interconnected, this can achieve a high degree of packing and saves interconnection bandwidth.
Random Start (RS):
In contrast to NF, RS keeps separate indices for each conference to start the search for appropriate processing units for newly arriving media operations belonging to that conference. This is meant to achieve better clustering and thus, less interconnection bandwidth consumption than NF. For a newly starting conference, a random processing unit is chosen.
Worst Start (WS):
WS is similar to RS, but instead of randomly finding a processing unit for a newly starting conference, WS starts at the processing unit with the highest remaining free capacity (thus the name worst fit). This is done in the hope that the highest initial packing can be achieved for the new session. We have also tested several others like plain first-, best-, worstfit, etc., known from old memory management systems placing data elements in memory, but they do not take bandwidth into account so we have only used these as basis and benchmarks.
3. Scheduling 3.1. Parallel Test Tasks Scheduling:-Parallel test technology ,which means multiple test tasks of more than one UUT can undergo testing simultaneously [1] , is rooted in parallel manage theory .SCMT(Single Core MultiThread)simulates multi-mission environment by computing speed and weakness of human apperceive ability [2] , obviously unsuited for high real-time requirement of parallel test architecture. Multi-processor architecture which is used to solve this problem at present, increases test costs and complexity of ATS. With development of MCP (Multi-Core Platform) technology, Application of multi-core technology in parallel test is discussed in this survey, and presented mission schedule strategy based on workload.
Parallel Test System Based on Multi Core
Processor:-Different with multi-processor architecture, MCP realizes more than two "executive core" in one processing chip. It not only effectively reduced complexity and costs of ATS, but also supports plentiful of parallel ability based on hardware and thread level [7] . Executive core has executive sets and structure system resource, and connects with former bus interface with arbitration which communicates cores by synchronistic function model. At multi-core platform, threads run at independent executive core [11] . As complexity of test task decomposition and parallel scheduling restricts application of multi-core in parallel, most programs was complied in ordinal model which couldn't exert parallel processing ability of MCP. So, bottleneck of multi-core platform is software as shown in figure 1. Tester should decompose test flow into independent missions before scheduling reconfiguration to get better performance at MCP. This survey presented three decompose methods according tasks, data processing and flow of test: mission decomposition, data decomposition and data flow decomposition. Table 1 introduces these three methods in detail. 
.1 Parallel Test Task Description
In parallel test system, missions which constitute test, have complex relationship. This structure which could described with DAG (Directed Acyclic Graph) in graph theory. It expresses as a dual-group G= (W, E). W= {(w11, w12…w1i), (w21, w22…w2i) … (w11, w12…w1i)}, E= {e1, e2…ei…}. wij is simulated with node in DAG, represents test j of UUT i. The value of wij is workload, expresses with xij; ei is simulated with directed edge in DAG, represents prepare time and relationship between missions, and it also could be expressed with (w1, w2). w1 is former mission of w2. 
Conclusion
Mission decompose, data decompose and data flow decompose methods advanced are effective decompose methods. Parallel mission schedule integrated considering elements as test resource, compute resource and test mission priority, and optimized mission scheduling process. The experiment shows that parallel test has benefit significantly from multi-core technology. For the generalized resource scheduling problem considered above, it is not clear which variation of the list scheduling will perform best. Our intuition is that scheduling subtasks by considering all resource types together will result in bounded suboptimal solutions. The input parameters are given to the simulator as fixed values or as a range of values with a minimum and maximum value. Subtask execution times, communication latencies, communication transfer rates, data items amounts, and data sets amounts, are specified to the simulator as range of values. 
5.References

