29 research outputs found

    G-LOMARC-TS: Lookahead group matchmaking for time/space sharing on multi-core parallel machines

    Get PDF
    Parallel machines with multi-core nodes are becoming increasingly popular. The performances of applications running on these machines are improved gradually due to the resource competition in each node. Researches have found that coscheduling different applications with complementary resource characteristics on the same set of nodes (semi time sharing) may improve the performance. We propose a scheduling algorithm G-LOMARC-TS which incorporates both space and semi time sharing scheduling methods and matches groups of jobs if possible for coscheduling. Since matchmaking may select jobs further down the waiting queue and the jobs in front of the queue may be delayed subsequently, fairness for each individual job will be watched and the delay will be kept within a limited bound. Several heuristics are used to solve the NP-complete problem of forming groups. Our experiment results show both utilization gain and average relative response time improvements of G-LOMARC-TS over other several scheduling policies

    Extending Scojo-PECT by migration based on system-level checkpointing

    Get PDF
    In recent years, a significant amount of research has been done on job scheduling in high performance computing area. Parallel jobs have different running time and require a different number of processors, thus jobs need to be scheduled and packed to improve system utilization. Scojo-PECT is a job scheduler which provides service guarantees by using coarse-grain time sharing. However, Scojo-PECT does not provide process migration. We extend the Scojo-PECT by migrating parallel jobs based on system-level checkpointing. We investigate different cases in the Scojo-PECT scheduling algorithm where migration based on system-level checkpointing can be used to improve resource utilization and reduce job response time. Our experimental results show reduction of relative response times on medium jobs over the results of the original Scojo-PECT scheduler and the long jobs do not suffer any disadvantage

    Coscheduling under Memory Constraints in a NOW Environment

    Full text link

    Adaptive Resource Relocation in Virtualized Heterogeneous Clusters

    No full text
    Cluster computing has recently gone through an evolution from single processor systems to multicore/multi-socket systems. This has resulted in lowering the cost/performance ratio of the compute machines. Compute farms that host these machines tend to become heterogeneous over time due to incremental extensions, hardware upgrades and/or nodes being purchased for users with particular needs. This heterogeneity is not surprising given the wide range of processor, memory and network technologies that become available and the relatively small price difference between these various options. Different CPU architectures, memory capacities, communication and I/O interfaces of the participating compute nodes present many challenges to job scheduling and often result in under or over utilization of the compute resources. In general, it is not feasible for the application programmers to specifically optimize their programs for such a set of differing compute n odes, due to the difficulty and time-intensiveness of such a task. The trend of heterogeneous compute farms has coincided with resurgence in the virtualization technology. Virtualization technology is receiving widespread adoption, mainly due to the benefits of server consolidation and isolation, load balancing, security and fault tolerance. Virtualization has also generated considerable interest in the High Performance Computing (HPC) community, due to the resulting high availability, fault tolerance, cluster partitioning and accommodation of conflicting user requirements. However, the HPC community is still wary of the potential overheads associated with‘ virtualization, as it results in slower network communications and disk I/O, which need to be addressed. The live migration feature, available to most virtualization technologies, can be leveraged to improve the throughput of a heterogeneous compute farm (HC) used for HPC applications. For this we mitigated the slow network communication in Xen; an open source virtual machine monitor. We present a detailed analysis of the communication framework of Xen and propose communication configurations that give 50% improvement over the conventional Xen network configuration. From a detailed study of the migration facility in Xen, we propose an improvement in the live migration facility specifically targeting HPC applications. This optimization gives around 50% improvement over the default migration facility of Xen. In this thesis, we also investigate resource scheduling in heterogeneous compute farm with the perspective of dynamic resource re-mapping. Our approach is to profile each job in the compute farm at runtime, and propose a better resource mapping compared to the initial allocation. We then migrate the job(s) to the best-suited homogeneous sub-cluster to improve overall throughput of the HC. For this, we develop a novel heterogeneity and virtualization-aware profiling framework, which is able to predict the CPU and communication characteristics of high performance scientific applications. The prediction accuracy of our performance estimation model is over 80%. The framework implementation is lightweight, with an overhead of 3%. Our experiments show that we are able to improve the throughput of the compute farm by 25% and the time saved by the HC with our framework is over 30%. The framework can be readily extended to HCs supporting a cloud computing environment

    A Distributed Bio-Inspired Method for Multisite Grid Mapping

    Get PDF
    Computational grids assemble multisite and multiowner resources and represent the most promising solutions for processing distributed computationally intensive applications, each composed by a collection of communicating tasks. The execution of an application on a grid presumes three successive steps: the localization of the available resources together with their characteristics and status; the mapping which selects the resources that, during the estimated running time, better support this execution and, at last, the scheduling of the tasks. These operations are very difficult both because the availability and workload of grid resources change dynamically and because, in many cases, multisite mapping must be adopted to exploit all the possible benefits. As the mapping problem in parallel systems, already known as NP-complete, becomes even harder in distributed heterogeneous environments as in grids, evolutionary techniques can be adopted to find near-optimal solutions. In this paper an effective and efficient multisite mapping, based on a distributed Differential Evolution algorithm, is proposed. The aim is to minimize the time required to complete the execution of the application, selecting from among all the potential ones the solution which reduces the use of the grid resources. The proposed mapper is tested on different scenarios

    Distributed and Multiprocessor Scheduling

    Get PDF
    This chapter discusses CPU scheduling in parallel and distributed systems. CPU scheduling is part of a broader class of resource allocation problems, and is probably the most carefully studied such problem. The main motivation for multiprocessor scheduling is the desire for increased speed in the execution of a workload. Parts of the workload, called tasks, can be spread across several processors and thus be executed more quickly than on a single processor. In this chapter, we will examine techniques for providing this facility. The scheduling problem for multiprocessor systems can be generally stated as \How can we execute a set of tasks T on a set of processors P subject to some set of optimizing criteria C? The most common goal of scheduling is to minimize the expected runtime of a task set. Examples of other scheduling criteria include minimizing the cost, minimizing communication delay, giving priority to certain users\u27 processes, or needs for specialized hardware devices. The scheduling policy for a multiprocessor system usually embodies a mixture of several of these criteria. Section 2 outlines general issues in multiprocessor scheduling and gives background material, including issues specific to either parallel or distributed scheduling. Section 3 describes the best practices from prior work in the area, including a broad survey of existing scheduling algorithms and mechanisms. Section 4 outlines research issues and gives a summary. Section 5 lists the terms defined in this chapter, while sections 6 and 7 give references to important research publications in the area

    A memory-centric approach to enable timing-predictability within embedded many-core accelerators

    Get PDF
    There is an increasing interest among real-time systems architects for multi- and many-core accelerated platforms. The main obstacle towards the adoption of such devices within industrial settings is related to the difficulties in tightly estimating the multiple interferences that may arise among the parallel components of the system. This in particular concerns concurrent accesses to shared memory and communication resources. Existing worst-case execution time analyses are extremely pessimistic, especially when adopted for systems composed of hundreds-tothousands of cores. This significantly limits the potential for the adoption of these platforms in real-time systems. In this paper, we study how the predictable execution model (PREM), a memory-aware approach to enable timing-predictability in realtime systems, can be successfully adopted on multi- and manycore heterogeneous platforms. Using a state-of-the-art multi-core platform as a testbed, we validate that it is possible to obtain an order-of-magnitude improvement in the WCET bounds of parallel applications, if data movements are adequately orchestrated in accordance with PREM. We identify which system parameters mostly affect the tremendous performance opportunities offered by this approach, both on average and in the worst case, moving the first step towards predictable many-core systems

    A distributed bio-inspired method for multisite grid mapping

    Get PDF
    Computational grids assemble multisite and multiowner resources and represent the most promising solutions for processing distributed computationally intensive applications, each composed by a collection of communicating tasks. The execution of an application on a grid presumes three successive steps: the localization of the available resources together with their characteristics and status; the mapping which selects the resources that, during the estimated running time, better support this execution and, at last, the scheduling of the tasks. These operations are very difficult both because the availability and workload of grid resources change dynamically and because, in many cases, multisite mapping must be adopted to exploit all the possible benefits. As the mapping problem in parallel systems, already known as NP-complete, becomes even harder in distributed heterogeneous environments as in grids, evolutionary techniques can be adopted to find near-optimal solutions. In this paper an effective and efficient multisite mapping, based on a distributed Differential Evolution algorithm, is proposed. The aim is to minimize the time required to complete the execution of the application, selecting from among all the potential ones the solution which reduces the use of the grid resources. The proposed mapper is tested on different scenarios

    An efficient virtual network interface in the FUGU scalable workstation dc by Kenneth Martin Mackenzie.

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 123-129).Ph.D

    Architectural support for enhancing security in clusters

    Get PDF
    Cluster computing has emerged as a common approach for providing more comput- ing and data resources in industry as well as in academia. However, since cluster computer developers have paid more attention to performance and cost e±ciency than to security, numerous security loopholes in cluster servers come to the forefront. Clusters usually rely on ¯rewalls for their security, but the ¯rewalls cannot prevent all security attacks; therefore, cluster systems should be designed to be robust to security attacks intrinsically. In this research, we propose architectural supports for enhancing security of clus- ter systems with marginal performance overhead. This research proceeds in a bottom- up fashion starting from enforcing each cluster component's security to building an integrated secure cluster. First, we propose secure cluster interconnects providing con- ¯dentiality, authentication, and availability. Second, a security accelerating network interface card architecture is proposed to enable low performance overhead encryption and authentication. Third, to enhance security in an individual cluster node, we pro- pose a secure design for shared-memory multiprocessors (SMP) architecture, which is deployed in many clusters. The secure SMP architecture will provide con¯dential communication between processors. This will remove the vulnerability of eavesdrop- ping attacks in a cluster node. Finally, to put all proposed schemes together, we propose a security/performance trade-o® model which can precisely predict performance of an integrated secure cluster
    corecore