72 research outputs found

    Game theoretic analysis of the slurm scheduler model

    Get PDF
    In the context of High Performance Computing, scheduling is a necessary tool to ensure that there exists acceptable quality of service for the many users of the processing power available. The scheduling process can vary from a simple First Comes First Served model to a wide variety of more complex implementations that tend to satisfy specific requirements from each group of users. Slurm is an open source, faulttolerant, and highly scalable cluster management system for large and small Linux clusters [1]. MareNostrum 4, a High Performance Computer, implements it to manage the execution of jobs send to it by a variety of users [2]. Previous work has been done from an algorithmic approach that attempts at directly reduce queuing times among other costs [3][4]. We consider that there is utility at looking at the problem also from a Game Theoretic perspective to define clearly the mechanics involved in the system, and also those that define the influx of tasks that the scheduler manages. We model the Slurm scheduling mechanism using Game Theoretic concepts, tools, and reasonable simplifications in an attempt to formally characterize and study it. We identify variables that play a significant role in the scheduling process and also experiment with changes in the model that could make users behave in a way that would improve overall quality of service. We recognize that the complexity of the models might derive in difficulty to theoretically analyze them, so we make use of usage data derived from real usage from BSC-CNS users to measure performance. The real usage data is extracted from Autosubmit [5], a workflow manager developed at the Earth Science Department at BSC-CNS. This is a convenient choice, given that we also attempt to measure the influence of an external agent (e.g. a workflow manager) could have in the overall quality of service if it imposes restrictions, and the nature of these restrictions

    A new priority rule cloud scheduling technique that utilizes gaps to increase the efficiency of jobs distribution

    Get PDF
    In recent years, the concept of cloud computing has been gaining traction to provide dynamically increasing access to shared computing resources (software and hardware) via the internet. It’s no secret that cloud computing’s ability to supply mission-critical services has made job scheduling a hot subject in the industry right now. However, the efficient utilization of these cloud resources has been a challenge, often resulting in wastage or degraded service performance due to poor scheduling. To solve this issue, existing research has been focused on queue-based job scheduling techniques, where jobs are scheduled based on specific deadlines or job lengths. To overcome this challenge, numerous researchers have focused on improving existing Priority Rule (PR) cloud schedulers by developing dynamic scheduling algorithms, but they have fallen short of meeting user satisfaction, such as flowtime, makespan, and total tardiness. These are the limitations of the current implementation of existing Priority Rule (PR) schedulers, mainly caused by blocking made by jobs at the head of the queue. These limitations lead to the poor performance of cloud-based mobile applications and other cloud services. To address this issue, the main objective of this research is to improve the existing PR cloud schedulers by developing a new dynamic scheduling algorithm by manipulating the gaps in the cloud job schedule. In this thesis, first a Priority-Based Fair Scheduling (PBFS) algorithm has been introduced to schedule jobs so that jobs get access to the required resources at optimal times. Then, a backfilling strategy called Shortest Gap Priority-Based Fair Scheduling (SG-PBFS) is proposed that attempts to manipulate the gaps in the schedule of cloud jobs. Finally, the performance evaluation demonstrates that the proposed SG-PBFS algorithm outperforms SG-SJF, SG-LJF, SG-FCFS, SG-EDF, and SG-(MAX-MIN) in terms of flow time, makespan time, and total tardiness, which conclusively demonstrates its effectiveness. The experiment result shows that for 500 jobs, SG-PBFS flow time, makespan time, and tardiness time are 9%, 4%, and 7% less than PBFS gradually

    Adaptive Resource Relocation in Virtualized Heterogeneous Clusters

    No full text
    Cluster computing has recently gone through an evolution from single processor systems to multicore/multi-socket systems. This has resulted in lowering the cost/performance ratio of the compute machines. Compute farms that host these machines tend to become heterogeneous over time due to incremental extensions, hardware upgrades and/or nodes being purchased for users with particular needs. This heterogeneity is not surprising given the wide range of processor, memory and network technologies that become available and the relatively small price difference between these various options. Different CPU architectures, memory capacities, communication and I/O interfaces of the participating compute nodes present many challenges to job scheduling and often result in under or over utilization of the compute resources. In general, it is not feasible for the application programmers to specifically optimize their programs for such a set of differing compute n odes, due to the difficulty and time-intensiveness of such a task. The trend of heterogeneous compute farms has coincided with resurgence in the virtualization technology. Virtualization technology is receiving widespread adoption, mainly due to the benefits of server consolidation and isolation, load balancing, security and fault tolerance. Virtualization has also generated considerable interest in the High Performance Computing (HPC) community, due to the resulting high availability, fault tolerance, cluster partitioning and accommodation of conflicting user requirements. However, the HPC community is still wary of the potential overheads associated with‘ virtualization, as it results in slower network communications and disk I/O, which need to be addressed. The live migration feature, available to most virtualization technologies, can be leveraged to improve the throughput of a heterogeneous compute farm (HC) used for HPC applications. For this we mitigated the slow network communication in Xen; an open source virtual machine monitor. We present a detailed analysis of the communication framework of Xen and propose communication configurations that give 50% improvement over the conventional Xen network configuration. From a detailed study of the migration facility in Xen, we propose an improvement in the live migration facility specifically targeting HPC applications. This optimization gives around 50% improvement over the default migration facility of Xen. In this thesis, we also investigate resource scheduling in heterogeneous compute farm with the perspective of dynamic resource re-mapping. Our approach is to profile each job in the compute farm at runtime, and propose a better resource mapping compared to the initial allocation. We then migrate the job(s) to the best-suited homogeneous sub-cluster to improve overall throughput of the HC. For this, we develop a novel heterogeneity and virtualization-aware profiling framework, which is able to predict the CPU and communication characteristics of high performance scientific applications. The prediction accuracy of our performance estimation model is over 80%. The framework implementation is lightweight, with an overhead of 3%. Our experiments show that we are able to improve the throughput of the compute farm by 25% and the time saved by the HC with our framework is over 30%. The framework can be readily extended to HCs supporting a cloud computing environment

    The Impact of Novel Computing Architectures on Large-Scale Distributed Web Information Retrieval Systems

    Get PDF
    Web search engines are the most popular mean of interaction with the Web. Realizing a search engine which scales even to such issues presents many challenges. Fast crawling technology is needed to gather the Web documents. Indexing has to process hundreds of gigabytes of data efficiently. Queries have to be handled quickly, at a rate of thousands per second. As a solution, within a datacenter, services are built up from clusters of common homogeneous PCs. However, Information Retrieval (IR) has to face issues raised by the growing amount of Web data, as well as the number of new users. In response to these issues, cost-effective specialized hardware is available nowadays. In our opinion, this hardware is ideal for migrating distributed IR systems to computer clusters comprising heterogeneous processors in order to respond their need of computing power. Toward this end, we introduce K-model, a computational model to properly evaluate algorithms designed for such hardware. We study the impact of K-model rules on algorithm design. To evaluate the benefits of using K-model in evaluating algorithms, we compare the complexity of a solution built using our properly designed techniques, and the existing ones. Although in theory competitors are more efficient than us, empirically, K-model is able to prove because our solutions have been shown to be faster than the state-of-the-art implementations

    Improved self-management of datacenter systems applying machine learning

    Get PDF
    Autonomic Computing is a Computer Science and Technologies research area, originated during mid 2000's. It focuses on optimization and improvement of complex distributed computing systems through self-control and self-management. As distributed computing systems grow in complexity, like multi-datacenter systems in cloud computing, the system operators and architects need more help to understand, design and optimize manually these systems, even more when these systems are distributed along the world and belong to different entities and authorities. Self-management lets these distributed computing systems improve their resource and energy management, a very important issue when resources have a cost, by obtaining, running or maintaining them. Here we propose to improve Autonomic Computing techniques for resource management by applying modeling and prediction methods from Machine Learning and Artificial Intelligence. Machine Learning methods can find accurate models from system behaviors and often intelligible explanations to them, also predict and infer system states and values. These models obtained from automatic learning have the advantage of being easily updated to workload or configuration changes by re-taking examples and re-training the predictors. So employing automatic modeling and predictive abilities, we can find new methods for making "intelligent" decisions and discovering new information and knowledge from systems. This thesis departs from the state of the art, where management is based on administrators expertise, well known data, ad-hoc studied algorithms and models, and elements to be studied from computing machine point of view; to a novel state of the art where management is driven by models learned from the same system, providing useful feedback, making up for incomplete, missing or uncertain data, from a global network of datacenters point of view. - First of all, we cover the scenario where the decision maker works knowing all pieces of information from the system: how much will each job consume, how is and will be the desired quality of service, what are the deadlines for the workload, etc. All of this focusing on each component and policy of each element involved in executing these jobs. -Then we focus on the scenario where instead of fixed oracles that provide us information from an expert formula or set of conditions, machine learning is used to create these oracles. Here we look at components and specific details while some part of the information is not known and must be learned and predicted. - We reduce the problem of optimizing resource allocations and requirements for virtualized web-services to a mathematical problem, indicating each factor, variable and element involved, also all the constraints the scheduling process must attend to. The scheduling problem can be modeled as a Mixed Integer Linear Program. Here we face an scenario of a full datacenter, further we introduce some information prediction. - We complement the model by expanding the predicted elements, studying the main resources (this is CPU, Memory and IO) that can suffer from noise, inaccuracy or unavailability. Once learning predictors for certain components let the decision making improve, the system can become more ¿expert-knowledge independent¿ and research can focus on an scenario where all the elements provide noisy, uncertainty or private information. Also we introduce to the management optimization new factors as for each datacenter context and costs may change, turning the model as "multi-datacenter" - Finally, we review of the cost of placing datacenters depending on green energy sources, and distribute the load according to green energy availability

    Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    Computer Science & Technology Series : XXI Argentine Congress of Computer Science. Selected papers

    Get PDF
    CACIC’15 was the 21thCongress in the CACIC series. It was organized by the School of Technology at the UNNOBA (North-West of Buenos Aires National University) in Junín, Buenos Aires. The Congress included 13 Workshops with 131 accepted papers, 4 Conferences, 2 invited tutorials, different meetings related with Computer Science Education (Professors, PhD students, Curricula) and an International School with 6 courses. CACIC 2015 was organized following the traditional Congress format, with 13 Workshops covering a diversity of dimensions of Computer Science Research. Each topic was supervised by a committee of 3-5 chairs of different Universities. The call for papers attracted a total of 202 submissions. An average of 2.5 review reports werecollected for each paper, for a grand total of 495 review reports that involved about 191 different reviewers. A total of 131 full papers, involving 404 authors and 75 Universities, were accepted and 24 of them were selected for this book.Red de Universidades con Carreras en Informática (RedUNCI

    Simulation and optimization model for the construction of electrical substations

    Get PDF
    One of the most complex construction projects is electrical substations. An electrical substation is an auxiliary station of an electricity generation, transmission and distribution system where voltage is transformed from high to low or the reverse using transformers. Construction of electrical substation includes civil works and electromechanical works. The scope of civil works includes construction of several buildings/components divided into parallel and overlapped working phases that require variety of resources and are generally quite costly and consume a considerable amount of time. Therefore, construction of substations faces complicated time-cost-resource optimization problems. On another hand, the construction industry is turning out to be progressively competitive throughout the years, whereby the need to persistently discover approaches to enhance construction performance. To address the previously stated afflictions, this dissertation makes the underlying strides and introduces a simulation and optimization model for the execution processes of civil works for an electrical substation based on database excel file for input data entry. The input data include bill of quantities, maximum available resources, production rates, unit cost of resources and indirect cost. The model is built on Anylogic software using discrete event simulation method. The model is divided into three zones working in parallel to each other. Each zone includes a group of buildings related to the same construction area. Each zone-model describes the execution process schedule for each building in the zone, the time consumed, percentage of utilization of equipment and manpower crews, amount of materials consumed and total direct and indirect cost. The model is then optimized to mainly minimize the project duration using parameter variation experiment and genetic algorithm java code implemented using Anylogic platform. The model used allocated resource parameters as decision variables and available resources as constraints. The model is verified on real case studies in Egypt and sensitivity analysis studies are incorporated. The model is also validated using a real case study and proves its efficiency by attaining a reduction in model time units between simulation and optimization experiments of 10.25% and reduction in total cost of 4.7%. Also, by comparing the optimization results by the actual data of the case study, the model attains a reduction in time and cost by 13.6% and 6.3% respectively. An analysis to determine the effect of each resource on reduction in cost is also presented

    Scheduling of pipeline construction projects using simulation

    Get PDF
    Repetitive Projects represent a large percentage of construction projects. They usually have an immense importance for a nation’s economy and future. Highways, tunnels, infrastructure networks, high-rise buildings, housing projects, pipeline networks, airport runways, railways, bridges, sewer mains and mass transit systems are all considered projects of repetitive nature. Research that started to serve industrial purposes for the military efforts in World War II has been revised and improved to be employed for repetitive construction projects. Obtaining an optimum schedule that would be achievable, feasible, and comprehensive by all involved parties besides maintaining minimum overall cost and duration has been an important objective. Another main objective was to maintain an optimal formation of various types of crews and equipment that would avoid idle periods as well as work stoppages. Various examples of mathematical models presented in the literature were presented as an example to show their limitations. This research presents a simulation-based scheduling model for pipeline construction projects. The model was developed with a simulation software called “AnyLogic”; this software supports discrete events, agent based and system dynamics simulation, presents an easy graphical user interface and utilizes Java coding. The model consists of various types of pre-programmed objects that were used and connected together to model the different stages of the project and resources involved within them. The model also contains a simulation experiment that would be used to provide the visual presentation of the construction process including the layout of the project and all kinds of utilized resources moving within it. The final part of the model is the optimization module. This module has the definition of the optimization objective, the optimization parameters and constraints. This module would run the simulation experiment a numerous trials while changing the parameters to get the optimal solution which is the optimal schedule for the project. This simulation model would aid planners in scheduling, tracking and controlling the construction operations over the lifetime of the project. It would present an important tool for top management to visualize the impact of their decisions
    corecore