22 research outputs found

    Isolated VM Storage on Clouds

    No full text

    Cluster Computing Environment Supporting Single System Image

    No full text
    Single system image(SSI) systems have been the mainstay of high-performance computing for many years. SSI requires the integration and aggregation of all types of resources in a cluster to present a single interface to users. In this paper, we describe a cluster computing environment supporting SSI, constructed through three components: single process space(SPS), process migration, and dynamic load balancing. These components attempt to share all available resources in the cluster among all executing processes, so that the cluster operates like a single node with much more computing power. The most important goal is to combine these constructs in innovative ways for building cluster computing environment for SSI, as well as individually take a novel approache to improve performance or functionality. Our implementation of process migration has the capability of resolving broken pipe problems and bind errors on server socket reconstruction. We realize SPS based on block PID allocation. We also designed and implemented a dynamic load balancing scheme which resolves the limitations of our previous work by continuously tracing the job resource usage at runtime. The experimental results show that these three constructs for SSI clusters realized scalability, new functionality and performance improvement. The cluster computing environment allows these constructs to cooperate implicitly so that they create a synergy effect at the SSI cluster system level and successfully provide a single system image to users and administrators.

    [Photograph 2012.201.B1280.0151]

    No full text
    Photograph used for a story in the Daily Oklahoman newspaper. Caption: "The musical entertainment 'Dancin" is produced by Tom Mallow in association with James Janek.

    Request Reordering to Enhance the Performance of Strict Consistency Models

    No full text

    BEST: Best-effort energy saving techniques for NAND flash-based hybrid storage

    No full text

    EXPLOITING TEMPORAL LOCALITY FOR ENERGY EFFICIENT MEMORY MANAGEMENT

    No full text
    Memory is becoming one of the major power consumers in computing systems. Therefore, energy efficient memory management is essential. Modern memory systems employ sleep states for energy saving. To utilize this feature, existing research activities have concentrated on increasing spatial locality to deactivate as many blocks as possible. However, they did not count the unexpected activation of memory blocks due to cache eviction of deactivated tasks. In this paper, we suggest a software-based power state management scheme for memory, which exploits temporal locality to relieve the energy loss from the unexpected activation of memory blocks from cache eviction. The suggested scheme SW-NAP makes a memory block remain deactivated during a certain tick, which has no cache miss over the block. The evaluation shows that SW-NAP is 50% better than PAVM, which is an existing software scheme, and worse than PMU, which is another approach based on the specialized hardware by 20%. We also suggest task scheduling policies that increase the e. ectiveness of SW-NAP and they saved up to 7% additional energy.open1

    Design and Analysis of Hybrid Flow Control for Hierarchical Ring Network-on-Chip

    No full text
    A cost-efficient network-on-chip is needed in a scalable many-core systems. Recent multicore processors have leveraged a ring topology and hierarchical ring can increase scalability but presents different challenges, including higher hop count and global ring bottleneck. In this work, we describe a hierarchical ring topology that we refer to as a transportation-network-inspired network-on-chip (tNoC) that leverages principles from transportation network systems. In particular, we propose a novel hybridflow control for hierarchical ring topology to scale the topology efficiently. The flow control is hybrid in that the channels are allocated on flit granularity while the buffers are allocated on packet granularity. The hybrid flow control enables a simplified router microarchitecture (to minimize per-hop latency) as router input buffers are minimized and buffers are pushed to the edges, either at the output ports or at the hub routers that interconnect the local rings to the global ring-while still supporting virtual channels to avoid protocol deadlock. We describe a packet-quota-system (PQS) and a separate credit network that provide congestion management, support prioritized arbitration in the network, and provide support for multiflit packets. We also provide alternative designs for the credit network and PQS architectures. A detailed evaluation of a 64-core CMP shows that the tNoC improves performance by up to 21 percent compared with a baseline, buffered hierarchical ring topology while reducing NoC energy by 51 percent.11sciescopu

    An Adaptive Partitioning Scheme for DRAM-based Cache in Solid State Drives

    No full text
    (SSDs) have been rapidly adopted in laptops, desktops, and server storage systems because their performance is superior to that of traditional magnetic disks. However, NAND flash memory has some limitations such as out-of-place updates, bulk erase operations, and a limited number of write operations. To alleviate these unfavorable characteristics, various techniques for improving internal software and hardware components have been devised. In particular, the internal device cache of SSDs has a significant impact on the performance. The device cache is used for two main purposes: to absorb frequent read/write requests and to store logical-to-physical address mapping information. In the device cache, we observed that the optimal ratio of the data buffering and the address mapping space changes according to workload characteristics. To achieve optimal performance in SSDs, the device cache should be appropriately partitioned between the two main purposes. In this paper, we propose an adaptive partitioning scheme, which is based on a ghost caching mechanism, to adaptively tune the ratio of the buffering and the mapping space in the device cache according to the workload characteristics. The simulation results demonstrate that the performance of the proposed scheme approximates the best performance. I
    corecore