8 research outputs found
Space sharing job scheduling policies for parallel computers
The distinguishing characteristic of space sharing parallel job scheduling policies is that applications are allocated non-overlapping processor subsets. The interference among jobs is reduced, the synchronization delays and message latencies can be predictable, and distinct processors may be allocated to cooperating processes so as to avoid the overhead of context switches associated with traditional time-multiplexing;The processor allocation strategy, the job selection criteria, and workload characteristics are fundamental factors that influence system performance under space sharing. Allocation can be static or dynamic. The processor subset allocated to an application is fixed under static space sharing, whereas it can change during execution under dynamic space sharing. Static allocation can produce more predictable run times, permits a wide range of compiler optimizations (e.g., static data distribution and binding), and avoids the processor releases and reallocations associated with dynamic allocation. Its major problem is that it can induce high processor fragmentation;In this dissertation, alternative static and dynamic space sharing policies that differ in the allocation discipline and the job selection criteria are studied. The results show that significantly superior performance can be achieved under static space sharing if applications can be folded (i.e., allocated fewer processors than they requested). Folding typically increases program efficiency and can reduce processor fragmentation. Policies that increase folding with the system load are proposed and compared to schemes that use unconstrained folding, no folding, and fixed maximum folding factors. The adaptive policies produced higher and more stable system utilization, significantly shorter mean response times, and good fairness curves. However, unconstrained folding resulted in considerably more severe processor fragmentation than no folding. Its advantage is that it exploits the efficiency improvement that typically results when an application is allocated fewer processors. Consequently, it can produce shorter mean response times than no folding under medium to heavy loads;Also because of this efficiency improvement, dynamic policies that reduce waiting times by executing a large number of jobs simultaneously are more promising than schemes that limit the number of active jobs. However, limiting the number of active applications can be the superior approach when folding does not improve application efficiency
Flexible allocation and space management in storage systems
In this dissertation, we examine some of the challenges faced by the emerging
networked storage systems. We focus on two main issues. Current file systems allocate
storage statically at the time of their creation. This results in many suboptimal
scenarios, for example: (a) space on the disk is not allocated well across multiple
file systems, (b) data is not organized well for typical access patterns. We propose
Virtual Allocation for flexible storage allocation. Virtual allocation separates storage
allocation from the file system. It employs an allocate-on-write strategy, which lets
applications fit into the actual usage of storage space without regard to the configured
file system size. This improves flexibility by allowing storage space to be shared across
different file systems. We present the design of virtual allocation and an evaluation
of it through benchmarks based on a prototype system on Linux.
Next, based on virtual allocation, we consider the problem of balancing locality and load in networked storage systems with multiple storage devices (or bricks).
Data distribution affects locality and load balance across the devices in a networked
storage system. We propose user-optimal data migration scheme which tries to balance locality and load balance in such networked storage systems. The presented
approach automatically and transparently manages migration of data blocks among
disks as data access patterns and loads change over time. We built a prototype system on Linux and present the design of user-optimal migration and an evaluation of
it through realistic experiments
Performance and Memory Space Optimizations for Embedded Systems
Embedded systems have three common principles: real-time performance, low power consumption, and low price (limited hardware). Embedded computers use chip multiprocessors (CMPs) to meet these expectations. However, one of the major problems is lack of efficient software support for CMPs; in particular, automated code parallelizers are needed.
The aim of this study is to explore various ways to increase performance, as well as reducing resource usage and energy consumption for embedded systems. We use code restructuring, loop scheduling, data transformation, code and data placement, and scratch-pad memory (SPM) management as our tools in different embedded system scenarios. The majority of our work is focused on loop scheduling. Main contributions of our work are:
We propose a memory saving strategy that exploits the value locality in array data by storing arrays in a compressed form. Based on the compressed forms of the input arrays, our approach automatically determines the compressed forms of the output arrays and also automatically restructures the code.
We propose and evaluate a compiler-directed code scheduling scheme, which considers both parallelism and data locality. It analyzes the code using a locality parallelism graph representation, and assigns the nodes of this graph to processors.We also introduce an Integer Linear Programming based formulation of the scheduling problem.
We propose a compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. The method is to distribute loop iterations across parallel processors in an SPM-conscious manner. The compiler identifies potential SPM hits and misses, and distributes loop iterations such that the processors have close execution times.
We present an SPM management technique using Markov chain based data access.
We propose a compiler directed integrated code and data placement scheme for 2-D mesh based CMP architectures. Using a Code-Data Affinity Graph (CDAG) to represent the relationship between loop iterations and array data, it assigns the sets of loop iterations to processing cores and sets of data blocks to on-chip memories. We present a memory bank aware dynamic loop scheduling scheme for array intensive applications.The goal is to minimize the number of memory banks needed for executing the group of loop iterations
Storage Systems for Non-volatile Memory Devices
This dissertation presents novel approaches to the use of non-volatile memory devices in building storage systems. There are many types of non-volatile memory devices, and they usually have better performance than regular magnetic hard disks in terms of throughput and latency. This dissertation focused on two of them, NAND flash memory and Phase Change Memory (PCM). This work consisted of two parts.
The first part was to design a high-performance hybrid storage system employing Solid State Drives that are build out of NAND flash memory and Hard Disk Drives. In this hybrid system, we proposed two different policies to improve its performance. One is to exploit the fact that the performances of Solid State Drive and Hard Disk Drive are asymmetric and the other is to exploit concurrency on multiple devices. We implemented prototypes in Linux and evaluate both policies in multiple workloads and multiple configurations. The results showed that the proposed approaches improve the performance significantly, and adapt to different configurations of the system under different workloads.
The second part was to implement a file system on a special class of memory devices, Storage Class Memory (SCM), which is both byte addressable and also nonvolatile, e.g. PCM. We claimed that both the existing regular file systems and the memory based file systems are not suitable for SCM, and proposed a new file system, called SCMFS, which is implemented on the virtual address space. In SCMFS, we utilized the existing memory management module in the operating system to do the block management. Our design keeps address space within a file contiguous to reduce the block management software. The simplicity of SCMFS not only makes it easy to implement, but also improves the performance. We implemented a prototype of SCMFS in Linux and evaluated its performance through multiple benchmarks
Monitorable network and CPU load statistics and their application to scheduling
Recent trends in high-speed computing have moved towards the use of networks of workstations as a cost-effective approach to parallel computing. One recently proposed solution involves the use of an existing network of workstation-class computers as a single multiprocessor, and much research is ongoing in this area;This dissertation describes work in the area of process scheduling on networks of workstations, specifically in the area of load analysis. After presenting extensive background in the field, measures of CPU and network load are defined, and a test parallel application program presented, written for a network-multiprocessing software package called PVM. A series of experiments is then detailed, whose goal was to discover the relationship between the run time of the test application and the loads on the participating workstations and networks. The experiments include measurement of CPU loading and network loading, both during test application runs, during artificially elevated loads, and during quiet conditions. Results of the experiments are presented, and the applications of the results to the problem of task scheduling examined. It is then claimed that several easily measured load measures are useful to task scheduling, by allowing run time to be predicted within a margin of error, and allowing limiting network segments to be detected and avoided
Dynamic computation migration in distributed shared memory systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Vita.Includes bibliographical references (p. 123-131).by Wilson Cheng-Yi Hsieh.Ph.D
Recommended from our members
A Market Model for Controlled Resource Allocation in Distributed Operating Systems
This thesis explores the potential for providing processes with control over their resource allocation in a general-purpose distributed system. Rather than present processes with blind explicit control or leave the decision to the operating system, a compromise, called process-centric resource allocation is proposed whereby processes have informed control of their resource allocation, while the operating system ensures fair consumption.
The motivations for this approach to resource allocation and its background are reviewed culminating in the description of a set of desired attributes for such a system. A three layered architecture called ERA is then proposed and presented in detail. The lowest layer, provides a unified framework for processes to choose resources, describe their priority and describes the range of available resources. A resource information mechanism, used to support choices of distributed resources then utilises this framework. Finally, experimental demonstrations of process-centric resource allocation are used to illustrate the third layer.
This design and its algorithms together provide a resource allocation system wherein distributed resources are shared fairly amongst competing processes which can choose their resources. The system allows processes to mimic traditional resource allocations and perform novel and beneficial resource optimisations. Experimental results are presented indicating that this can be achieved with low overhead and in a scalable fashion