436 research outputs found

    ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

    Get PDF
    Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A traditional application scheduler running on a parallel cluster only supports static scheduling where the number of processors allocated to an application remains fixed throughout the lifetime of execution of the job. Due to the unpredictability in job arrival times and varying resource requirements, static scheduling can result in idle system resources thereby decreasing the overall system throughput. In this paper we present a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executed on distributed memory platforms. The framework includes a scheduler that supports resizing of applications, an API to enable applications to interact with the scheduler, and a library that makes resizing viable. Applications executed using the ReSHAPE scheduler framework can expand to take advantage of additional free processors or can shrink to accommodate a high priority application, without getting suspended. In our research, we have mainly focused on structured applications that have two-dimensional data arrays distributed across a two-dimensional processor grid. The resize library includes algorithms for processor selection and processor mapping. Experimental results show that the ReSHAPE framework can improve individual job turn-around time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference on Parallel Processing (ICPP'07

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    A Survey of Techniques for Architecting TLBs

    Get PDF
    “Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

    ACCELERATING STORAGE APPLICATIONS WITH EMERGING KEY VALUE STORAGE DEVICES

    Get PDF
    With the continuous data explosion in the big data era, traditional software and hardware stack are facing unprecedented challenges on how to operate on such data scale. Thus, designing new architectures and efficient systems for data oriented applications has become increasingly critical. This motivates us to re-think of the conventional storage system design and re-architect both software and hardware to meet the challenges of scale. Besides the fast growth of data volume, the increasing demand on storage applications such as video streaming, data analytics are pushing high performance flash based storage devices to replace the traditional spinning disks. Such all-flash era increase the data reliability concerns due to the endurance problem of flash devices. Key-value stores (KVS) are important storage infrastructure to handle the fast growing unstructured data and have been widely deployed in a variety of scale-out enterprise applications such as online retail, big data analytic, social networks, etc. How to efficiently manage data redundancy for key-value stores to provide data reliability, how to efficiently support range query for key-value stores to accelerate analytic oriented applications under emerging key-value store system architecture become an important research problem. In this research, we focus on how to design new software hardware architectures for the keyvalue store applications to provide reliability and improve query performance. In order to address the different issues identified in this dissertation, we propose to employ a logical key management layer, a thin layer above the KV devices that maps logical keys into phsyical keys on the devices. We show how such a layer can enable multiple solutions to improve the performance and reliability of KVSSD based storage systems. First, we present KVRAID, a high performance, write efficient erasure coding management scheme on emerging key-value SSDs. The core innovation of KVRAID is to propose a logical key management layer that maps logical keys to physical keys to efficiently pack similar size KV objects and dynamically manage the membership of erasure coding groups. Unlike existing schemes which manage erasure codes on the block level, KVRAID manages the erasure codes on the KV object level. In order to achieve better storage efficiency for variable sized objects, KVRAID predefines multiple fixed sizes (slabs) according to the object size distribution for the erasure code. KVRAID uses a logical to physical key conversion to pack the KV objects of similar size into a parity group. KVRAID uses a lazy deletion mechanism with a garbage collector for object updates. Our experiments show that in 100% put case, KVRAID outperforms software block RAID by 18x in case of throughput and reduces 15x write amplification (WAF) with only ~5% CPU utilization. In a mixed update/get workloads, KVRAID achieves ~4x better throughput with ~23% CPU utilization and reduces the storage overhead and WAF by 3.6x and 11.3x in average respectively. Second, we present KVRangeDB, an ordered log structure tree based key index that supports range queries on a hash-based KVSSD. In addition, we propose to pack smaller application records into a larger physical record on the device through the logical key management layer. We compared the performance of KVRangeDB against RocksDB implementation on KVSSD and stateof- art software KV-store Wisckey on block device, on three types of real world applications of cloud-serving workloads, TABLEFS filesystem and time-series databases. For cloud serving applications, KVRangeDB achieves 8.3x and 1.7x better 99.9% write tail latency respectively compared to RocksDB implementation on KV-SSD and Wisckey on block SSD. On the query side, KVrangeDB only performs worse for those very long scans, but provides fast point queries and closed range queries. The experiments on TABLEFS demonstrate that using KVRangeDB for metadata indexing can boost the performance by a factor of ~6.3x in average and reduce ~3.9x CPU cost for four metadata-intensive workloads compared to RocksDB implementation on KVSSD. Compared toWisckey, KVRangeDB improves performance by ~2.6x in average and reduces ~1.7x CPU usage. Third, we propose a generic FPGA accelerator for emerging Minimum Storage Regenerating (MSR) codes encoding/decoding which maximizes the computation parallelism and minimizes the data movement between off-chip DRAM and the on-chip SRAM buffers. To demonstrate the efficiency of our proposed accelerator, we implemented the encoding/decoding algorithms for a specific MSR code called Zigzag code on Xilinx VCU1525 acceleration card. Our evaluation shows our proposed accelerator can achieve ~2.4-3.1x better throughput and ~4.2-5.7x better power efficiency compared to the state-of-art multi-core CPU implementation and ~2.8-3.3x better throughput and ~4.2-5.3x better power efficiency compared to a modern GPU accelerato

    TOWARD HIGHLY SECURE AND AUTONOMIC COMPUTING SYSTEMS: A HIERARCHICAL APPROACH

    Full text link
    corecore