1,898 research outputs found
Low-latency message passing over gigabit ethernet clusters
As Ethernet hardware bandwidth increased to Gigabit speeds it became evident that it was difficult for conventional messaging protocols to deliver this performance to the application layer. Kernel based protocols such as TCP/IP impose a significant load on the host processor in order to service incoming packets and pass them to the application layer. Under heavy loads this problem can also lead to the host processor being completely used up for processing incoming messages, thus starving host applications of CPU resources. Another problem suffered by inter-process communication using small messages is the latency imposed by memory-to-memory copying in layered protocols as well as the slow context switching times in kernel-level schedulers required for servicing incoming interrupts. All this has put pressure on messaging software which led to the development of several lower latency userlevel protocols specifically adapted to high-performance networks (see U-Net[18], EMP[16], VIA[3], QsNET[15], Active Messages[19], GM[13], FM[14]). The aim of this paper is to investigate the issues involved in building high performance cluster messaging systems. We will also review some of the more prominent work in the area as well as propose a low-overhead low-latency messaging system to be used by a cluster of commodity platforms running over Gigabit Ethernet. We propose to use the programmable Netgear GA620-T NICs and modify their firmware to design a lightweight reliable OS-bypass protocol for message passing. We propose the use of zero-copy and polling techniques in order to keep host CPU utilization to a minimum whilst obtaining the maximum bandwidth possible.peer-reviewe
A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Brok
Computational Grids, coupling geographically distributed resources such as
PCs, workstations, clusters, and scientific instruments, have emerged as a next
generation computing platform for solving large-scale problems in science,
engineering, and commerce. However, application development, resource
management, and scheduling in these environments continue to be a complex
undertaking. In this article, we discuss our efforts in developing a resource
management system for scheduling computations on resources distributed across
the world with varying quality of service. Our service-oriented grid computing
system called Nimrod-G manages all operations associated with remote execution
including resource discovery, trading, scheduling based on economic principles
and a user defined quality of service requirement. The Nimrod-G resource broker
is implemented by leveraging existing technologies such as Globus, and provides
new services that are essential for constructing industrial-strength Grids. We
discuss results of preliminary experiments on scheduling some parametric
computations using the Nimrod-G resource broker on a world-wide grid testbed
that spans five continents
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
- …