Search CORE

36 research outputs found

Distributed dispatchers for partially clairvoyant schedulers

Author: Yellajyosula Kiran S.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2003
Field of study

This work focuses on the empirical evaluation of distributed dispatching strategies on shared and distributed memory architectures for hard real-time systems. The dispatching model accommodates process parameter variability and analyzes the effect of variable execution times.;Hard real-time systems are modeled in the E-T-C scheduling framework and dispatched if a valid schedule exists. We examine the dispatchability of Partially Clairvoyant schedules of different sizes and varying deadlines under reasonable assumptions. The effect of scaling up the number of processors used by the dispatcher is also studied. The results validate the superiority of the distributed strategies over sequential dispatching and scalability of the distributed strategies. Certain system limitations which lead to Loss of Dispatchability in the experiments were pointed out.;The model finds applications in diverse areas like safety critical systems, robotics and machine control, real-time data management, and this approach is targeted at powering up the controllers

The Research Repository @ WVU (West Virginia University)

A Path Relinking Method for the Joint Online Scheduling and Capacity Allocation of DL Training Workloads in GPU as a Service Systems

Author: Arezoo Jahani
Danilo Ardagna
Edoardo Amaldi
Federica Filippini
Marco Lattuada
Michele Ciavotta
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

The Deep Learning (DL) paradigm gained remarkable popularity in recent years. DL models are used to tackle increasingly complex problems, making the training process require considerable computational power. The parallel computing capabilities offered by modern GPUs partially fulfill this need, but the high costs related to GPU as a Service solutions in the cloud call for efficient capacity planning and job scheduling algorithms to reduce operational costs via resource sharing. In this work, we jointly address the online capacity planning and job scheduling problems from the perspective of cloud end-users. We present a Mixed Integer Linear Programming (MILP) formulation, and a path relinking-based method aiming at optimizing operational costs by (i) rightsizing Virtual Machine (VM) capacity at each node, (ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and (iii) determining a due-date-aware job schedule. An extensive experimental campaign attests the effectiveness of the proposed approach in practical scenarios: costs savings up to 97% are attained compared with first-principle methods based on, e.g., Earliest Deadline First, cost reductions up to 20% are obtained with respect to a previously proposed Hierarchical Method and up to 95% against a dynamic programming-based method from the literature. Scalability analyses show that systems with up to 100 nodes and 450 concurrent jobs can be managed in less than 7 seconds. The validation in a prototype cloud environment shows a deviation below 5% between real and predicted costs

Archivio istituzionale della ricerca - Politecnico di Milano

A real-time distributed resource-granting task scheduler for a virtual infrastructure management system

Author: Yang Yunlu, M. Eng. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 79-80).This M.Eng. thesis introduces a distributed real-time task scheduler that is designed and implemented to replace the role of a lock server for a cloud service that manages data centers that contain virtual machines. A round-robin scheduling algorithm is proposed that orders tasks based on their needs of locks, serializing tasks that conflict in their lock use. Tasks are required to obtain all locks before execution. The distributed part of the scheduler is based on Zookeeper, an open-source data service for distributed applications. The scheduler includes a failover scheme that relies on task states stored on Zookeeper. The design is highly available-the scheduler will keep scheduling tasks as long as at least one instance is running. Experimental results show that the scheduler cuts down overall task completion time by approximately a half compared to a lock server on both benchmarks used.by Yunlu Yang.M.Eng

DSpace@MIT

Energy-aware scheduling in heterogeneous computing systems

Author: Iturriaga Santiago
Publication venue: Udelar. FI.
Publication date: 01/01/2013
Field of study

In the last decade, the grid computing systems emerged as useful provider of the computing power required for solving complex problems. The classic formulation of the scheduling problem in heterogeneous computing systems is NP-hard, thus approximation techniques are required for solving real-world scenarios of this problem. This thesis tackles the problem of scheduling tasks in a heterogeneous computing environment in reduced execution times, considering the schedule length and the total energy consumption as the optimization objectives. An efficient multithreading local search algorithm for solving the multi-objective scheduling problem in heterogeneous computing systems, named MEMLS, is presented. The proposed method follows a fully multi-objective approach, applying a Pareto-based dominance search that is executed in parallel by using several threads. The experimental analysis demonstrates that the new multithreading algorithm outperforms a set of fast and accurate two-phase deterministic heuristics based on the traditional MinMin. The new ME-MLS method is able to achieve significant improvements in both makespan and energy consumption objectives in reduced execution times for a large set of testbed instances, while exhibiting very good scalability. The ME-MLS was evaluated solving instances comprised of up to 2048 tasks and 64 machines. In order to scale the dimension of the problem instances even further and tackle large-sized problem instances, the Graphical Processing Unit (GPU) architecture is considered. This line of future work has been initially tackled with the gPALS: a hybrid CPU/GPU local search algorithm for efficiently tackling a single-objective heterogeneous computing scheduling problem. The gPALS shows very promising results, being able to tackle instances of up to 32768 tasks and 1024 machines in reasonable execution times.En la última década, los sistemas de computación grid se han convertido en útiles proveedores de la capacidad de cálculo necesaria para la resolución de problemas complejos. En su formulación clásica, el problema de la planificación de tareas en sistemas heterogéneos es un problema NP difícil, por lo que se requieren técnicas de resolución aproximadas para atacar instancias de tamaño realista de este problema. Esta tesis aborda el problema de la planificación de tareas en sistemas heterogéneos, considerando el largo de la planificación y el consumo energético como objetivos a optimizar. Para la resolución de este problema se propone un algoritmo de búsqueda local eficiente y multihilo. El método propuesto se trata de un enfoque plenamente multiobjetivo que consiste en la aplicación de una búsqueda basada en dominancia de Pareto que se ejecuta en paralelo mediante el uso de varios hilos de ejecución. El análisis experimental demuestra que el algoritmo multithilado propuesto supera a un conjunto de heurísticas deterministas rápidas y e caces basadas en el algoritmo MinMin tradicional. El nuevo método, ME-MLS, es capaz de lograr mejoras significativas tanto en el largo de la planificación y como en consumo energético, en tiempos de ejecución reducidos para un gran número de casos de prueba, mientras que exhibe una escalabilidad muy promisoria. El ME-MLS fue evaluado abordando instancias de hasta 2048 tareas y 64 máquinas. Con el n de aumentar la dimensión de las instancias abordadas y hacer frente a instancias de gran tamaño, se consideró la utilización de la arquitectura provista por las unidades de procesamiento gráfico (GPU). Esta línea de trabajo futuro ha sido abordada inicialmente con el algoritmo gPALS: un algoritmo híbrido CPU/GPU de búsqueda local para la planificación de tareas en en sistemas heterogéneos considerando el largo de la planificación como único objetivo. La evaluación del algoritmo gPALS ha mostrado resultados muy prometedores, siendo capaz de abordar instancias de hasta 32768 tareas y 1024 máquinas en tiempos de ejecución razonables

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Study of Time and Energy Efficient Algorithms for Parallel and Heterogeneous Computing

Author: Ojiaku JK
Publication venue
Publication date
Field of study

This PhD project is motivated by the need to develop and achieve better and energy efficient computing through the use of parallelism and heterogeneous systems. Our contribution consists of both theoretical aspects, as well as in-depth and comprehensive empirical studies that aim to provide more insight into parallel and heterogeneous computing. Our first problem is a theoretical problem that focuses on the scheduling of a special category of jobs known as deteriorating jobs. These kind of jobs will require more effort to complete them if postponed to a later time. They are intended to model several industrial processes including steel production, fire-fighting and financial management. We study the problem in the context of parallel machine scheduling in an online setting where jobs have arbitrary release times. Our main results show that List Scheduling is

(1+b_{max})

-competitive and that no deterministic algorithm is better than

(1+b_{max})^{1-\frac{1}{m}}

, where

b_{max}

is the largest deteriorating rate. We also extend our results to online deterministic algorithms and show that no deterministic online algorithm is better than

(1+b_{max})

-competitive. Our next study concerns the scheduling of

n

jobs with precedence constraints on

m

parallel machines. We are interested in the precedence constraint known as chain precedence constraint where each job can have at most one predecessor and at most one successor. The jobs are modelled as directed acyclic graphs where nodes represent the jobs and edges represent the precedence constraints between jobs. The jobs have a strict deadline that must be met. The parallel machines are considered to be unrelated and a communication network connects each pair of machines. Execution of the jobs on the machines as well as communication across the network incurs costs in the form of time and energy. These costs are given by cost matrices that covers processing and communication. The goal is to construct a feasible schedule that minimizes the total energy required to execute the chain of jobs on the machines, such that all deadlines are met. We present a dynamic programming solution to the problem that leads to a pseudo polynomial time algorithm with running time

O(nm^2d_{max})

, where

d_{max}

is the largest deadline. We show that the algorithm computes an optimal schedule where one exists. We then proceed to a similar problem that involves the scheduling of jobs to minimize flow time plus energy. This problem is based on a dynamic speed scaling heuristic in literature that is able to adjust the speed of a processor based on the number of \emph{active jobs}, called AJC. We present a comprehensive empirical study that consists of several job selection, speed selection and processor allocation heuristics. We also consider both single processor and multi processor settings. Our main goal is to investigate the viability of designing a fixed-speed counterpart for AJC, that is not as computationally intensive as AJC, while being very simple. We also evaluate the performance of this fixed speed heuristic and compare it with that of AJC. Our fourth and final study involves the use of graphics processing unit (GPU) as an accelerator for compute intensive tasks. The GPU has become a very popular multi processor for heterogeneous computing both from an economical point of view and performance standpoint. Firstly, we contribute to the development of a Bioinformatics tool, called GapsMis, by implementing a heterogeneous version that uses graphics processors for acceleration. GapsMis is a tool designed for the alignment of sequences, like protein and DNA sequences, and allows for the insertion of gaps in the alignment. Then we present a case study that aims to highlight the various aspects, including benefits and challenges, involved in developing heterogeneous applications that is vendor-agnostic. In order to do this we select four algorithms as case studies including GapsMis and the algorithm presented in our second problem. The other two algorithms are based on the Velocity-Verlet integration and the Fruchterman-Reingold force-based method for graph layout. We make use of the Open Computing Language (OpenCL) and C++ for implementation of the algorithms on a range of graphics processors from Advanced Micro Devices (AMD) and NVIDIA Corporation. We evaluate several factors that can affect performance of these applications on each hardware. We also compare the performance of our algorithms in a multi-GPU setting and against single and multi-core CPU implementations. Furthermore, several metrics are defined to capture several aspects of performance including execution time of application kernel(s), execution time of application including communication times, throughput, power and energy consumption

University of Liverpool Repository

Online scheduling for real-time multitasking on reconfigurable hardware devices

Author: Wassi-Leupi Guy
Publication venue
Publication date: 01/01/2011
Field of study

Nowadays the ever increasing algorithmic complexity of embedded applications requires the designers to turn towards heterogeneous and highly integrated systems denoted as SoC (System-on-a-Chip). These architectures may embed CPU-based processors, dedicated datapaths as well as recon gurable units. However, embedded SoCs are submitted to stringent requirements in terms of speed, size, cost, power consumption, throughput, etc. Therefore, new computing paradigms are required to ful l the constraints of the applications and the requirements of the architecture. Recon gurable Computing is a promising paradigm that provides probably the best trade-o between these requirements and constraints. Dynamically recon gurable architectures are their key enabling technology. They enable the hardware to adapt to the application at runtime. However, these architectures raise new challenges in SoC design. For example, on one hand, designing a system that takes advantage of dynamic recon guration is still very time consuming because of the lack of design methodologies and tools. On the other hand, scheduling hardware tasks di ers from classical software tasks scheduling on microprocessor or multiprocessors systems, as it bears a further complicated placement problem. This thesis deals with the problem of scheduling online real-time hardware tasks on Dynamically Recon gurable Hardware Devices (DRHWs). The problem is addressed from two angles : (i) Investigating novel algorithms for online real-time scheduling/placement on DRHWs. (ii) Scheduling/Placement algorithms library for RTOS-driven Design Space Exploration (DSE). Regarding the first point, the thesis proposes two main runtime-aware scheduling and placement techniques and assesses their suitability for online real-time scenarios. The first technique discusses the impact of synthesizing, at design time, several shapes and/or sizes per hardware task (denoted as multi-shape task), in order to ease the online scheduling process. The second technique combines a looking-ahead scheduling approach with a slots-based recon gurable areas management that relies on a 1D placement. The results show that in both techniques, the scheduling and placement quality is improved without signi cantly increasing the algorithm time complexity. Regarding the second point, in the process of designing SoCs embedding recon gurable parts, new design paradigms tend to explore and validate as early as possible, at system level, the architectural design space. Therefore, the RTOS (Real-Time Operating System) services that manage the recon gurable parts of the SoC can be re fined. In such a context, gathering numerous hardware tasks scheduling and placement algorithms of various complexity vs performance trade-o s in a kind of library is required. In this thesis, proposed algorithms in addition to some existing ones are purposely implemented in C++ language, in order to insure the compatibility with any C++/SystemC based SoC design methodology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

Bucks New University: Bucks Knowledge Archive

OpenGrey Repository

Group-Based Parallel Multi-scheduling Methods for Grid Computing

Author: Abraham Goodhead Tomvie
Publication venue
Publication date: 01/01/2016
Field of study

Coventry University Pure Portal

Real-Time Wireless Sensor-Actuator Networks for Cyber-Physical Systems

Author: Saifullah Abusayeed
Publication venue: Washington University Open Scholarship
Publication date: 01/09/2014
Field of study

A cyber-physical system (CPS) employs tight integration of, and coordination between computational, networking, and physical elements. Wireless sensor-actuator networks provide a new communication technology for a broad range of CPS applications such as process control, smart manufacturing, and data center management. Sensing and control in these systems need to meet stringent real-time performance requirements on communication latency in challenging environments. There have been limited results on real-time scheduling theory for wireless sensor-actuator networks. Real-time transmission scheduling and analysis for wireless sensor-actuator networks requires new methodologies to deal with unique characteristics of wireless communication. Furthermore, the performance of a wireless control involves intricate interactions between real-time communication and control. This thesis research tackles these challenges and make a series of contributions to the theory and system for wireless CPS. (1) We establish a new real-time scheduling theory for wireless sensor-actuator networks. (2) We develop a scheduling-control co-design approach for holistic optimization of control performance in a wireless control system. (3) We design and implement a wireless sensor-actuator network for CPS in data center power management. (4) We expand our research to develop scheduling algorithms and analyses for real-time parallel computing to support computation-intensive CPS

Washington University St. Louis: Open Scholarship

Integrating Consumer Flexibility in Smart Grid and Mobility Systems - An Online Optimization and Online Mechanism Design Approach

Author: Ströhle Philipp
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

Consumer flexibility may provide an important lever to align supply and demand in service systems. However, harnessing dispersed flexibility endowments in the presence of self-interested agents requires appropriate incentive structures. This thesis quantifies the potential value of consumers\u27 flexibility in smart grid and mobility systems. In order to include incentives, online optimization approaches are augmented with methods from online mechanism design

KITopen

Energy-Aware Real-Time Scheduling on Heterogeneous and Homogeneous Platforms in the Era of Parallel Computing

Author: Bhuiyan Ashik Ahmed
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2021
Field of study

Multi-core processors increasingly appear as an enabling platform for embedded systems, e.g., mobile phones, tablets, computerized numerical controls, etc. The parallel task model, where a task can execute on multiple cores simultaneously, can efficiently exploit the multi-core platform\u27s computational ability. Many computation-intensive systems (e.g., self-driving cars) that demand stringent timing requirements often evolve in the form of parallel tasks. Several real-time embedded system applications demand predictable timing behavior and satisfy other system constraints, such as energy consumption. Motivated by the facts mentioned above, this thesis studies the approach to integrating the dynamic voltage and frequency scaling (DVFS) policy with real-time embedded system application\u27s internal parallelism to reduce the worst-case energy consumption (WCEC), an essential requirement for energy-constrained systems. First, we propose an energy-sub-optimal scheduler, assuming the per-core speed tuning feature for each processor. Then we extend our solution to adapt the clustered multi-core platform, where at any given time, all the processors in the same cluster run at the same speed. We also present an analysis to exploit a task\u27s probabilistic information to improve the average-case energy consumption (ACEC), a common non-functional requirement of embedded systems. Due to the strict requirement of temporal correctness, the majority of the real-time system analysis considered the worst-case scenario, leading to resource over-provisioning and cost. The mixed-criticality (MC) framework was proposed to minimize energy consumption and resource over-provisioning. MC scheduling has received considerable attention from the real-time system research community, as it is crucial to designing safety-critical real-time systems. This thesis further addresses energy-aware scheduling of real-time tasks in an MC platform, where tasks with varying criticality levels (i.e., importance) are integrated into a common platform. We propose an algorithm GEDF-VD for scheduling MC tasks with internal parallelism in a multiprocessor platform. We also prove the correctness of GEDF-VD, provide a detailed quantitative evaluation, and reported extensive experimental results. Finally, we present an analysis to exploit a task\u27s probabilistic information at their respective criticality levels. Our proposed approach reduces the average-case energy consumption while satisfying the worst-case timing requirement

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)