5,229 research outputs found
Load balancing techniques for I/O intensive tasks on heterogeneous clusters
Load balancing schemes in a cluster system play a critically important role in developing highperformance cluster computing platform. Existing load balancing approaches are concerned with the effective usage of CPU and memory resources. I/O-intensive tasks running on a heterogeneous cluster need a highly effective usage of global I/O resources, previous CPU-or memory-centric load balancing schemes suffer significant performance drop under I/O- intensive workload due to the imbalance of I/O load. To solve this problem, Zhang et al. developed two I/O-aware load-balancing schemes, which consider system heterogeneity and migrate more I/O-intensive tasks from a node with high I/O utilization to those with low I/O utilization. If the workload is memory-intensive in nature, the new method applies a memory-based load balancing policy to assign the tasks. Likewise, when the workload becomes CPU-intensive, their scheme leverages a CPU-based policy as an efficient means to balance the system load. In doing so, the proposed approach maintains the same level of performance as the existing schemes when I/O load is low or well balanced. Results from a trace-driven simulation study show that, when a workload is I/O-intensive, the proposed schemes improve the performance with respect to mean slowdown over the existing schemes by up to a factor of 8. In addition, the slowdowns of almost all the policies increase consistently with the system heterogeneity
Libra: An Economy driven Job Scheduling System for Clusters
Clusters of computers have emerged as mainstream parallel and distributed
platforms for high-performance, high-throughput and high-availability
computing. To enable effective resource management on clusters, numerous
cluster managements systems and schedulers have been designed. However, their
focus has essentially been on maximizing CPU performance, but not on improving
the value of utility delivered to the user and quality of services. This paper
presents a new computational economy driven scheduling system called Libra,
which has been designed to support allocation of resources based on the users?
quality of service (QoS) requirements. It is intended to work as an add-on to
the existing queuing and resource management system. The first version has been
implemented as a plugin scheduler to the PBS (Portable Batch System) system.
The scheduler offers market-based economy driven service for managing batch
jobs on clusters by scheduling CPU time according to user utility as determined
by their budget and deadline rather than system performance considerations. The
Libra scheduler ensures that both these constraints are met within an O(n)
run-time. The Libra scheduler has been simulated using the GridSim toolkit to
carry out a detailed performance analysis. Results show that the deadline and
budget based proportional resource allocation strategy improves the utility of
the system and user satisfaction as compared to system-centric scheduling
strategies.Comment: 13 page
Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System Performance.
The use of a network of shared, heterogeneous workstations each harboring a Reconfigurable Computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system’s performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of Reconfigurable Computing systems. This dissertation develops and validates an analytic performance modeling methodology for a class of fork-join algorithms executing on a High Performance Reconfigurable Computing (HPRC) platform. The model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. Three fork-join class of applications, a Boolean Satisfiability Solver, a Matrix-Vector Multiplication algorithm, and an Advanced Encryption Standard algorithm are used to validate the model with homogeneous and simulated heterogeneous workstations. A synthetic load is used to validate the model under various loading conditions including simulating heterogeneity by making some workstations appear slower than others by the use of background loading. The performance modeling methodology proves to be accurate in characterizing the effects of reconfigurable devices, application load imbalance, background user load and heterogeneity for applications running on shared, homogeneous and heterogeneous HPRC resources. The model error in all cases was found to be less than five percent for application runtimes greater than thirty seconds and less than fifteen percent for runtimes less than thirty seconds. The performance modeling methodology enables us to characterize applications running on shared HPRC resources. Cost functions are used to impose system usage policies and the results of vii the modeling methodology are utilized to find the optimal (or near-optimal) set of workstations to use for a given application. The usage policies investigated include determining the computational costs for the workstations and balancing the priority of the background user load with the parallel application. The applications studied fall within the Master-Worker paradigm and are well suited for a grid computing approach. A method for using NetSolve, a grid middleware, with the model and cost functions is introduced whereby users can produce optimal workstation sets and schedules for Master-Worker applications running on shared HPRC resources
Recommended from our members
An evaluation of load sharing algorithms for heterogeneous distributed systems
Distributed systems offer the ability to execute a job at other nodes than the originating one. Load sharing algorithms use this ability to distribute work around the system in order to achieve greater efficiency. This is reflected in substantially reduced response times. In the majority of studies the systems on which load sharing has been evaluated have been homogeneous in nature. This thesis considers load sharing in heterogeneous systems, in which the heterogeneity is exhibited in the processing power of the constituent nodes.
Existing algorithms are evaluated and improved ones proposed. Most of the performance analysis is done through simulation. A model of diskless workstations communicating and transferring jobs by Remote Procedure Call is used. All assumptions about the overheads of inter-node communication are based upon measurements made on the university networks.
The comparison of algorithms identifies those characteristics that offer improved performance in heterogeneous systems. The level of system information required for transfer is investigated and an optimum found. Judicious use of the collected information via algorithm design is shown to account for much of the improvement. However detailed examination of algorithm behaviour compared with that of a 'optimum' load sharing scenario reveals that there are occasions when full use of all the information available is not beneficial. Investigations are carried out on the most promising algorithms to assess their adaptability, scalability and stability under a variety of differing conditions. The standard definitions of load balancing and load sharing are shown not to apply when considering heterogeneous systems.
To validate the assumptions in the simulation model a load sharing scenario was implemented on a network of Sun workstations at the University. While the scope of the implementation was somewhat limited by lack of resources, it does demonstrate the relative ease with which the algorithms can be implemented without alteration of the operating system code or modification at the kernel level
Distributed computing methodology for training neural networks in an image-guided diagnostic application
Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used
Enabling collaboration in virtual reality navigators
In this paper we characterize a feature superset for Collaborative
Virtual Reality Environments (CVRE), and derive a component
framework to transform stand-alone VR navigators into full-fledged
multithreaded collaborative environments. The contributions of our
approach rely on a cost-effective and extensible technique for
loading software components into separate POSIX threads for
rendering, user interaction and network communications, and adding a
top layer for managing session collaboration. The framework recasts
a VR navigator under a distributed peer-to-peer topology for scene
and object sharing, using callback hooks for broadcasting remote
events and multicamera perspective sharing with avatar interaction.
We validate the framework by applying it to our own ALICE VR
Navigator. Experimental results show that our approach has good
performance in the collaborative inspection of complex models.Postprint (published version
High performance computing of explicit schemes for electrofusion jointing process based on message-passing paradigm
The research focused on heterogeneous cluster workstations comprising of a number of CPUs in single and shared architecture platform. The problem statements under consideration involved one dimensional parabolic equations. The thermal process of electrofusion jointing was also discussed. Numerical schemes of explicit type such as AGE, Brian, and Charlies Methods were employed. The parallelization of these methods were based on the domain decomposition technique. Some parallel performance measurement for these methods were also addressed. Temperature profile of the one dimensional radial model of the electrofusion process were also given
Managing Uncertainty: A Case for Probabilistic Grid Scheduling
The Grid technology is evolving into a global, service-orientated
architecture, a universal platform for delivering future high demand
computational services. Strong adoption of the Grid and the utility computing
concept is leading to an increasing number of Grid installations running a wide
range of applications of different size and complexity. In this paper we
address the problem of elivering deadline/economy based scheduling in a
heterogeneous application environment using statistical properties of job
historical executions and its associated meta-data. This approach is motivated
by a study of six-month computational load generated by Grid applications in a
multi-purpose Grid cluster serving a community of twenty e-Science projects.
The observed job statistics, resource utilisation and user behaviour is
discussed in the context of management approaches and models most suitable for
supporting a probabilistic and autonomous scheduling architecture
The role of the host in a cooperating mainframe and workstation environment, volumes 1 and 2
In recent years, advancements made in computer systems have prompted a move from centralized computing based on timesharing a large mainframe computer to distributed computing based on a connected set of engineering workstations. A major factor in this advancement is the increased performance and lower cost of engineering workstations. The shift to distributed computing from centralized computing has led to challenges associated with the residency of application programs within the system. In a combined system of multiple engineering workstations attached to a mainframe host, the question arises as to how does a system designer assign applications between the larger mainframe host and the smaller, yet powerful, workstation. The concepts related to real time data processing are analyzed and systems are displayed which use a host mainframe and a number of engineering workstations interconnected by a local area network. In most cases, distributed systems can be classified as having a single function or multiple functions and as executing programs in real time or nonreal time. In a system of multiple computers, the degree of autonomy of the computers is important; a system with one master control computer generally differs in reliability, performance, and complexity from a system in which all computers share the control. This research is concerned with generating general criteria principles for software residency decisions (host or workstation) for a diverse yet coupled group of users (the clustered workstations) which may need the use of a shared resource (the mainframe) to perform their functions
- …