36 research outputs found

    Resource Allocation using Virtual Clusters

    Get PDF
    In this report we demonstrate the utility of resource allocations that use virtual machine technology for sharing parallel computing resources among competing users. We formalize the resource allocation problem with a number of underlying assumptions, determine its complexity, propose several heuristic algorithms to find near-optimal solutions, and evaluate these algorithms in simulation. We find that among our algorithms one is very efficient and also leads to the best resource allocations. We then describe how our approach can be made more general by removing several of the underlying assumptions

    Virtual machine scheduling in dedicated computing clusters

    Get PDF
    Time-critical applications process a continuous stream of input data and have to meet specific timing constraints. A common approach to ensure that such an application satisfies its constraints is over-provisioning: The application is deployed in a dedicated cluster environment with enough processing power to achieve the target performance for every specified data input rate. This approach comes with a drawback: At times of decreased data input rates, the cluster resources are not fully utilized. A typical use case is the HLT-Chain application that processes physics data at runtime of the ALICE experiment at CERN. From a perspective of cost and efficiency it is desirable to exploit temporarily unused cluster resources. Existing approaches aim for that goal by running additional applications. These approaches, however, a) lack in flexibility to dynamically grant the time-critical application the resources it needs, b) are insufficient for isolating the time-critical application from harmful side-effects introduced by additional applications or c) are not general because application-specific interfaces are used. In this thesis, a software framework is presented that allows to exploit unused resources in a dedicated cluster without harming a time-critical application. Additional applications are hosted in Virtual Machines (VMs) and unused cluster resources are allocated to these VMs at runtime. In order to avoid resource bottlenecks, the resource usage of VMs is dynamically modified according to the needs of the time-critical application. For this purpose, a number of previously not combined methods is used. On a global level, appropriate VM manipulations like hot migration, suspend/resume and start/stop are determined by an informed search heuristic and applied at runtime. Locally on cluster nodes, a feedback-controlled adaption of VM resource usage is carried out in a decentralized manner. The employment of this framework allows to increase a cluster’s usage by running additional applications, while at the same time preventing negative impact towards a time-critical application. This capability of the framework is shown for the HLT-Chain application: In an empirical evaluation the cluster CPU usage is increased from 49% to 79%, additional results are computed and no negative effect towards the HLT-Chain application are observed

    Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing

    Get PDF
    The availability of many-core computing platforms enables a wide variety of technical solutions for systems across the embedded, high-performance and cloud computing domains. However, large scale manycore systems are notoriously hard to optimise. Choices regarding resource allocation alone can account for wide variability in timeliness and energy dissipation (up to several orders of magnitude). Dynamic Resource Allocation in Embedded, High-Performance and Cloud Computing covers dynamic resource allocation heuristics for manycore systems, aiming to provide appropriate guarantees on performance and energy efficiency. It addresses different types of systems, aiming to harmonise the approaches to dynamic allocation across the complete spectrum between systems with little flexibility and strict real-time guarantees all the way to highly dynamic systems with soft performance requirements. Technical topics presented in the book include: Load and Resource Models Admission Control Feedback-based Allocation and Optimisation Search-based Allocation Heuristics Distributed Allocation based on Swarm Intelligence Value-Based Allocation Each of the topics is illustrated with examples based on realistic computational platforms such as Network-on-Chip manycore processors, grids and private cloud environments.Note.-- EUR 6,000 BPC fee funded by the EC FP7 Post-Grant Open Access Pilo

    Hybrid, Job-Aware, and Preemptive Datacenter Scheduling

    Get PDF
    Scheduling in datacenters is an important, yet challenging problem. Datacenters are composed of a large number, typically tens of thousands, of commodity computers running a variety of data-parallel jobs. The role of the scheduler is to assign cluster resources to jobs, which is not trivial due to the large scale of the cluster, as well as the high scheduling load (tens of thousands of scheduling decisions per second). Additionally to scalability, modern datacenters face increasingly heterogeneous workloads composed of long batch jobs, e.g., data analytics, and latency-sensitive short jobs, e.g., operations of user-facing services. In such workloads, and especially if the cluster is highly utilized, it is challenging to avoid short running jobs getting stuck behind long running jobs, i.e. head-of-line blocking. Schedulers have evolved from being centralized (one single scheduler for the entire cluster) to distributed (many schedulers that take scheduling decisions in parallel). Although distributed schedulers can handle the large-scale nature of datacenters, they trade scheduling latency for accuracy. The complexity of scheduling in datacenters is exacerbated by the data-parallel nature of the jobs. That is, a job is composed of multiple tasks and the job completes only when all of its tasks complete. A scheduler that takes into account this fact, i.e. job-aware, could use this information to provide better scheduling decisions. Furthermore, to improve the quality of their scheduling decisions, most of datacenter schedulers use job runtime estimates. Obtaining accurate runtime estimates is, however, far from trivial, and erroneous estimates may lead to sub-optimal scheduling decisions. Considering these challenges, in this dissertation we argue the following: (i) that a hybrid centralized/distributed design can get the best of both worlds by scheduling long jobs in a centralized way and short jobs in a distributed way; (ii) such a hybrid scheduler can avoid head-of-line blocking and provide job-awareness by dynamically partitioning the cluster for short and long jobs and by executing a job to completion once it started; (iii) a scheduler can dispense with runtime estimates by sharing the resources of a node with preemption, and load balancing jobs among the nodes

    Cloud Computing for Digital Libraries

    Get PDF
    Information management systems (digital libraries/repositories, learning management systems, content management systems) provide key technologies for the storage, preservation and dissemination of knowledge in its various forms, such as research documents, theses and dissertations, cultural heritage documents and audio files. These systems can make use of cloud computing to achieve high levels of scalability, while making services accessible to all at reasonable infrastructure costs and on-demand. This research aims to develop techniques for building scalable digital information management systems based on efficient and on-demand use of generic grid-based technologies such as cloud computing. In particular, this study explores the use of existing cloud computing resources offered by some popular cloud computing vendors such as Amazon Web Services. This involves making use of Amazon Simple Storage Service (Amazon S3) to store large and increasing volumes of data, Amazon Elastic Compute Cloud (Amazon EC2) to provide the required computational power and Amazon SimpleDB for querying and data indexing on Amazon S3. A proof-of-concept application comprising typical digital library services was developed and deployed in the cloud environment and evaluated for scalability when the demand for more data and services increases. The results from the evaluation show that it is possible to adopt cloud computing for digital libraries in addressing issues of massive data handling and dealing with large numbers of concurrent requests. Existing digital library systems could be migrated and deployed into the cloud
    corecore