22 research outputs found

    Introduction to the scheduling problem

    Full text link

    Hybrid scheduling algorithms in cloud computing: a review

    Get PDF
    Cloud computing is one of the emerging fields in computer science due to its several advancements like on-demand processing, resource sharing, and pay per use. There are several cloud computing issues like security, quality of service (QoS) management, data center energy consumption, and scaling. Scheduling is one of the several challenging problems in cloud computing, where several tasks need to be assigned to resources to optimize the quality of service parameters. Scheduling is a well-known NP-hard problem in cloud computing. This will require a suitable scheduling algorithm. Several heuristics and meta-heuristics algorithms were proposed for scheduling the user's task to the resources available in cloud computing in an optimal way. Hybrid scheduling algorithms have become popular in cloud computing. In this paper, we reviewed the hybrid algorithms, which are the combinations of two or more algorithms, used for scheduling in cloud computing. The basic idea behind the hybridization of the algorithm is to take useful features of the used algorithms. This article also classifies the hybrid algorithms and analyzes their objectives, quality of service (QoS) parameters, and future directions for hybrid scheduling algorithms

    High-Level Synthesis for Embedded Systems

    Get PDF

    Scheduling Tool for the Nevada Test and Training Range

    Get PDF
    Presently, the 57th Wing Scheduler at Nellis AFB schedules daily mission requests to the Nevada Test and Training Range (NTTR) airspace manually. The process is time consuming and may lead to suboptimal range resource allocations. The goal of this study is to provide the scheduler with an automated scheduling approach that will improve range scheduling efficiency. The tool developed uses range request data from units at Nellis AFB to produce daily mission schedules for a month long scheduling horizon with Microsoft VBA code and a commercial Integer Program (IP) solver. Under our current understanding of scheduler priorities, we formulate the problem with three different sets of objective function coefficients to maximize the utility of the schedules. We then analyze the differences in the schedules produced from a test data set comprised of requests from May 2017. Based on our analysis and results from testing, we recommend that the 57th Wing Scheduler employ one of our priority- based formulations to find fast and good feasible starting solutions to their monthly schedule build. This study demonstrates that using an IP approach to scheduling missions to the NTTR is feasible and has utility to the scheduling organization

    Pipeline scheduling with input port constraints for an FPGA-based biochemical simulator

    Get PDF
    This paper discusses design methodology of high-throughput arithmetic pipeline modules for an FPGA-based biochemical simulator. Since limitation of data-input bandwidth caused by port constraints often has a negative impact on pipeline scheduling results, we propose a priority assignment method of input data which enables efficient arithmetic pipeline scheduling under given input port constraints. Evaluation results with frequently used rate-law functions in biochemical models revealed that the proposed method achieved shorter latency compared to ASAP and ALAP scheduling with random input orders, reducing hardware costs by 17.57% and by 27.43% on average, respectively.The original publication is available at www.springerlink.co

    Scheduling Network Traffic for Grid Purposes

    Get PDF

    Loop pipelining with resource and timing constraints

    Get PDF
    Developing efficient programs for many of the current parallel computers is not easy due to the architectural complexity of those machines. The wide variety of machine organizations often makes it more difficult to port an existing program than to reprogram it completely. Therefore, powerful translators are necessary to generate effective code and free the programmer from concerns about the specific characteristics of the target machine. This work focuses on techniques to be used by an important class of translators, whose objective is to transform sequential programs into equivalent more parallel programs. The transformations are performed at instruction level in order to exploit low level parallelism and increase memory locality.Most of the current applications are programmed in languages which do not allow us to express parallelism between high-level sentences (as Pascal, C or Fortran). Furthermore, a lot of applications written ten or more years ago are still used today, and it is not feasible to rewrite such applications for many reasons (not only technical reasons, but also economic ones). Translators enable programmers to write the application in a familiar sequential programming language, without concerning their selves with the architecture of the target machine. Current compilers for parallel architectures not only translate a program written on a high-level language to the appropriate machine language, but also perform some transformations in the final code in order to execute the program in a more parallel way. The transformations improve the performance in the execution of the program by making use of the knowledge that the compiler has about the machine architecture. The semantics of the program remain intact after any transformation.Experiments show that limiting parallelization to basic blocks not included in loops limits maximum speedup. This is because loops often comprise a large portion of the parallelism available to be exploited in a program. For this reason, a lot of effort has been devoted in the recent years to parallelize loop execution. Several parallel computer architectures and compilation techniques have been proposed to exploit such a parallelism at different granularities. Multiprocessors exploit coarse grained parallelism by distributing entire loop iterations to different processors. Systems oriented to the high-level synthesis (HLS) of VLSI circuits, superscalar processors and very long instruction word (VLIW) processors exploit fine-grained parallelism at instruction level. This work addresses fine-grained parallelization of loops addressed to the HLS of VLSI circuits. Two algorithms are proposed for resource constraints and for timing constraints. An algorithm to reduce the number of registers required to execute a loop in a given architecture is also proposed

    Application-Specific Memory Subsystems

    Get PDF
    The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware
    corecore