687 research outputs found

    Computing the expected makespan of task graphs in the presence of silent errors

    Get PDF
    International audienceApplications structured as Directed Acyclic Graphs (DAGs) of tasks correspond to a general model of parallel computation that occurs in many domains, including popular scientific workflows. DAG scheduling has received an enormous amount of attention, and several list-scheduling heuristics have been proposed and shown to be effective in practice. Many of these heuristics make scheduling decisions based on path lengths in the DAG. At large scale, however, compute platforms and thus tasks are subject to various types of failures with no longer negligible probabilities of occurrence. Failures that have recently received increasing attention are " silent errors, " which cause a task to produce incorrect results even though it ran to completion. Tolerating silent errors is done by checking the validity of the results and re-executing the task from scratch in case of an invalid result. The execution time of a task then becomes a random variable, and so are path lengths. Unfortunately, computing the expected makespan of a DAG (and equivalently computing expected path lengths in a DAG) is a computationally difficult problem. Consequently, designing effective scheduling heuristics is preconditioned on computing accurate approximations of the expected makespan. In this work we propose an algorithm that computes a first order approximation of the expected makespan of a DAG when tasks are subject to silent errors. We compare our proposed approximation to previously proposed such approximations for three classes of application graphs from the field of numerical linear algebra. Our evaluations quantify approximation error with respect to a ground truth computed via a brute-force Monte Carlo method. We find that our proposed approximation outperforms previously proposed approaches, leading to large reductions in approximation error for low (and realistic) failure rates, while executing much faster

    On Characterizing the Data Access Complexity of Programs

    Full text link
    Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size

    Resource-constrained project scheduling.

    Get PDF
    Abstract: Resource-constrained project scheduling involves the scheduling of project activities subject to precedence and resource constraints in order to meet the objective(s) in the best possible way. The area covers a wide variety of problem types. The objective of this paper is to provide a survey of what we believe are important recent in the area . Our main focus will be on the recent progress made in and the encouraging computational experience gained with the use of optimal solution procedures for the basic resource-constrained project scheduling problem (RCPSP) and important extensions. The RCPSP involves the scheduling of a project its duration subject to zero-lag finish-start precedence constraints of the PERT/CPM type and constant availability constraints on the required set of renewable resources. We discuss recent striking advances in dealing with this problem using a new depth-first branch-and-bound procedure, elaborating on the effective and efficient branching scheme, bounding calculations and dominance rules, and discuss the potential of using truncated branch-and-bound. We derive a set of conclusions from the research on optimal solution procedures for the basis RCPSP and subsequently illustrate how effective and efficient branching rules and several of the strong dominance and bounding arguments can be extended to a rich and realistic variety of related problems. The preemptive resource-constrained project scheduling problem (PRCPSP) relaxes the nonpreemption condition of the RCPSP, thus allowing activities to be interrupted at integer points in time and resumed later without additional penalty cost. The generalized resource-constrained project scheduling (GRCPSP) extends the RCPSP to the case of precedence diagramming type of precedence constraints (minimal finish-start, start-start, start-finish, finish-finish precedence relations), activity ready times, deadlines and variable resource availability's. The resource-constrained project scheduling problem with generalized precedence relations (RCPSP-GPR) allows for start-start, finish-start and finish-finish constraints with minimal and maximal time lags. The MAX-NPV problem aims at scheduling project activities in order to maximize the net present value of the project in the absence of resource constraints. The resource-constrained project scheduling problem with discounted cash flows (RCPSP-DC) aims at the same non-regular objective in the presence of resource constraints. The resource availability cost problem (RACP) aims at determining the cheapest resource availability amounts for which a feasible solution exists that does not violate the project deadline. In the discrete time/cost trade-off problem (DTCTP) the duration of an activity is a discrete, non-increasing function of the amount of a single nonrenewable resource committed to it. In the discrete time/resource trade-off problem (DTRTP) the duration of an activity is a discrete, non-increasing function of the amount of a single renewable resource. Each activity must then be scheduled in one of its possible execution modes. In addition to time/resource trade-offs, the multi-mode project scheduling problem (MRCPSP) allows for resource/resource trade-offs and constraints on renewable, nonrenewable and doubly-constrained resources. We report on recent computational results and end with overall conclusions and suggestions for future research.Scheduling; Optimal;

    Easier Parallel Programming with Provably-Efficient Runtime Schedulers

    Get PDF
    Over the past decade processor manufacturers have pivoted from increasing uniprocessor performance to multicore architectures. However, utilizing this computational power has proved challenging for software developers. Many concurrency platforms and languages have emerged to address parallel programming challenges, yet writing correct and performant parallel code retains a reputation of being one of the hardest tasks a programmer can undertake. This dissertation will study how runtime scheduling systems can be used to make parallel programming easier. We address the difficulty in writing parallel data structures, automatically finding shared memory bugs, and reproducing non-deterministic synchronization bugs. Each of the systems presented depends on a novel runtime system which provides strong theoretical performance guarantees and performs well in practice

    Parallel computing in combinatorial optimization

    Get PDF

    Development and demonstration of an on-board mission planner for helicopters

    Get PDF
    Mission management tasks can be distributed within a planning hierarchy, where each level of the hierarchy addresses a scope of action, and associated time scale or planning horizon, and requirements for plan generation response time. The current work is focused on the far-field planning subproblem, with a scope and planning horizon encompassing the entire mission and with a response time required to be about two minutes. The far-feld planning problem is posed as a constrained optimization problem and algorithms and structural organizations are proposed for the solution. Algorithms are implemented in a developmental environment, and performance is assessed with respect to optimality and feasibility for the intended application and in comparison with alternative algorithms. This is done for the three major components of far-field planning: goal planning, waypoint path planning, and timeline management. It appears feasible to meet performance requirements on a 10 Mips flyable processor (dedicated to far-field planning) using a heuristically-guided simulated annealing technique for the goal planner, a modified A* search for the waypoint path planner, and a speed scheduling technique developed for this project

    Effective And Efficient Preemption Placement For Cache Overhead Minimization In Hard Real-Time Systems

    Get PDF
    Schedulability analysis for real-time systems has been the subject of prominent research over the past several decades. One of the key foundations of schedulability analysis is an accurate worst case execution time (WCET) for each task. In preemption based real-time systems, the CRPD can represent a significant component (up to 44% as documented in research literature) of variability to overall task WCET. Several methods have been employed to calculate CRPD with significant levels of pessimism that may result in a task set erroneously declared as non-schedulable. Furthermore, they do not take into account that CRPD cost is inherently a function of where preemptions actually occur. Our approach for computing CRPD via loaded cache blocks (LCBs) is more accurate in the sense that cache state reflects which cache blocks and the specific program locations where they are reloaded. Limited preemption models attempt to minimize preemption overhead (CRPD) by reducing the number of allowed preemptions and/or allowing preemption at program locations where the CRPD effect is minimized. These algorithms rely heavily on accurate CRPD measurements or estimation models in order to identify an optimal set of preemption points. Our approach improves the effectiveness of limited optimal preemption point placement algorithms by calculating the LCBs for each pair of adjacent preemptions to more accurately model task WCET and maximize schedulability as compared to existing preemption point placement approaches. We utilize dynamic programming technique to develop an optimal preemption point placement algorithm. Lastly, we will demonstrate, using a case study, improved task set schedulability and optimal preemption point placement via our new LCB characterization. We propose a new CRPD metric, called loaded cache blocks (LCB) which accurately characterizes the CRPD a real-time task may be subjected to due to the preemptive execution of higher priority tasks. We show how to integrate our new LCB metric into our newly developed algorithms that automatically place preemption points supporting linear control flow graphs (CFGs) for limited preemption scheduling applications. We extend the derivation of loaded cache blocks (LCB), that was proposed for linear control flow graphs (CFGs) to conditional CFGs. We show how to integrate our revised LCB metric into our newly developed algorithms that automatically place preemption points supporting conditional control flow graphs (CFGs) for limited preemption scheduling applications. For future work, we will verify the correctness of our framework through other measurable physical and hardware constraints. Also, we plan to complete our work on developing a generalized framework that can be seamlessly integrated into real-time schedulability analysis

    Advances in parallel programming for electronic design automation

    Get PDF
    The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient. Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance
    corecore