918 research outputs found

    Advanced list scheduling heuristic for task scheduling with communication contention for parallel embedded systems

    No full text
    WOSInternational audienceModern embedded systems tend to use multiple cores or processors for processing parallel applications. This paper indeed aims at task scheduling with communication contention for parallel embedded systems and proposes three advanced techniques to improve the list scheduling heuristic. Five groups of node levels (two existing groups and three new groups) are firstly used as node priorities to generate node lists. Then the critical child technique improves the selection of a processor in the scheduling process. Finally, the communication delay technique enlarges the idle time intervals on communication links. We also propose an advanced dynamic list scheduling heuristic by combining the three techniques. Experimental results show that the combined advanced dynamic heuristic is efficient to shorten the schedule length for most of the randomly generated DAGs in the cases of medium and high communication. Our method accelerates an application up to 80% in the case of high communication and can also reduce the use of hardware resources

    A List Scheduling Heuristic with New Node Priorities and Critical Child Technique for Task Scheduling with Communication Contention

    Get PDF
    Task scheduling is becoming an important aspect for parallel programming of modern embedded systems. In this chapter, the application to be scheduled is modeled as a Directed Acyclic Graph (DAG), and the architecture targets parallel embedded systems composed of multiple processors interconnected by buses and/or switches. This chapter presents new list scheduling heuristics with communication contention. Furthermore, we define new node priorities (top level and bottom level) to sort nodes, and propose an advanced technique named critical child to select a processor to execute a node. Experimental results show that the proposed method is effective to reduce the schedule length, and the runtime performance is greatly improved in the cases of medium and high communication. Since the communication cost is increasing from medium to high in modern applications like digital communication and video compression, the proposed method is well-adapted for scheduling these applications over parallel embedded systems

    Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors

    Get PDF
    In this paper, we consider the problem of scheduling and mapping precedence-constrained tasks to a network of heterogeneous processors. In such systems, processors are usually physically distributed, implying that the communication cost is considerably higher than in tightly coupled multiprocessors. Therefore, scheduling and mapping algorithms for such systems must schedule the tasks as well as the communication traffic by treating both the processors and communication links as important resources. We propose an algorithm that achieves these objectives and adapts its tasks scheduling and mapping decisions according to the given network topology. Just like tasks, messages are also scheduled and mapped to suitable links during the minimization of the finish times of tasks. Heterogeneity of processors is exploited by scheduling critical tasks to the fastest processors. Our extensive experimental study has demonstrated that the proposed algorithm is efficient, robust, and yields consistent performance over a wide range of scheduling parameters.published_or_final_versio

    CASCH: a tool for computer-aided scheduling

    Get PDF
    A software tool called Computer-Aided Scheduling (CASCH) for parallel processing on distributed-memory multiprocessors in a complete parallel programming environment is presented. A compiler automatically converts sequential applications into parallel codes to perform program parallelization. The parallel code that executes on a target machine is optimized by CASCH through proper scheduling and mapping.published_or_final_versio

    Using utilization profiles in allocation and partitioning for multiproscessor systems

    Get PDF
    technical reportThe problems of multiprocessor partitioning and program allocation are interdependent and critical to the performance of multiprocessor systems?? Minimizing resource partitions for parallel programs on partitionable multiprocessors facilitates greater processor utilization and throughput?? The pro cessing resource requirements of parallel programs vary during program execution and are allocation dependent?? Optimal resource utilization requires that resource requirements be modeled as variable over time?? This paper investigates the use of program pro les in allocating programs and parti tioning multiprocessor systems?? An allocation method is discussed?? The goals of this method are to minimize program execution time minimize the total number of processors used characterize variation in processor requirements over the lifetime of a program to accurately predict the impact on run time of the number of processors available at any point in time and to minimize uctuations in processor requirements to facilitate e cient sharing of processors between partitions on a partitionable multiprocessor?? An application to program partitioning is discussed that improves partition run times compared to other methods?

    Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems

    No full text
    International audienceWe introduce a new parallelization framework for scientific computing based on BDSC, an efficient automatic scheduling algorithm for parallel programs in the presence of resource constraints on the number of processors and their local memory size. BDSC extends Yang and Gerasoulis's Dominant Sequence Clus-tering (DSC) algorithm; it uses sophisticated cost models and addresses both shared and distributed parallel memory architectures. We describe BDSC, its integration within the PIPS compiler infrastructure and its application to the parallelization of four well-known scientific applications: Harris, ABF, equake and IS. Our experiments suggest that BDSC's focus on efficient resource man-agement leads to significant parallelization speedups on both shared and dis-tributed memory systems, improving upon DSC results, as shown by the com-parison of the sequential and parallelized versions of these four applications running on both OpenMP and MPI frameworks
    • …
    corecore